[OAK-5446] leaseUpdateThread might be blocked by leaseUpdateCheck Created: 12/Jan/17  Updated: 26/Jan/17

Status: Open
Project: Jackrabbit Oak
Component/s: core
Affects Version/s: 1.4, 1.5.14
Fix Version/s: 1.6

Type: Bug Priority: Major
Reporter: Stefan Eissing Assignee: Julian Reschke
Resolution: Unresolved Votes: 0
Labels: candidate_oak_1_4
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File OAK-5446.diff     File OAK-5446-jr.diff     File OAK-5446.testcase     File OAK-5446.testcase.v3    
Issue Links:
Cloners
is cloned by OAK-5528 leaseUpdateThread might be blocked by... Open
Reference
is related to OAK-3399 5sec retry loop before declaring leas... Closed
is related to OAK-5523 LeaseUpdateRetryLoop: add test coverage Resolved

 Description   

Fighting with cluster nodes losing their lease and shutting down oak-core in a cloud environment. For reasons unknown at this point in time, the whole process seems to skip about two minutes of real time.

This is a situation from which oak currently does not recover. Code analysis shows that ClusterNodeInfo is handed the LeaseCheckDocumentStoreWrapper instance to use as store. This is fatal since any action the renewLease() tries to do will first invoke the performLeaseCheck(). The lease check will, when the FailureMargin is reached, stall the renewLease() thread for 5 retry attempts and then declare the lease to be lost.

The ClusterNodeInfo should instead be using the "real" DocumentStore, not the wrapped one, IMO.



 Comments   
Comment by Julian Reschke [ 12/Jan/17 ]

Would be good to have a unit test (looking now).

FWIW, another way to avoid this would be to special case the CLUSTER collection in the LeaseCheckDocumentStoreWrapper.

Comment by Marcel Reutegger [ 16/Jan/17 ]

I agree, there should be a test. Looks like OAK-3399 didn't have one.

Comment by Julian Reschke [ 16/Jan/17 ]

I'll try to come up with a test case.

Comment by Julian Reschke [ 16/Jan/17 ]

Hmmm.

    public DocumentNodeStore(DocumentMK.Builder builder) {
        this.blobStore = builder.getBlobStore();
        this.statisticsProvider = builder.getStatisticsProvider();
        this.nodeStoreStatsCollector = builder.getNodeStoreStatsCollector();
        if (builder.isUseSimpleRevision()) {
            this.simpleRevisionCounter = new AtomicInteger(0);
        }
        DocumentStore s = builder.getDocumentStore();
        if (builder.getTiming()) {
            s = new TimingDocumentStoreWrapper(s);
        }
        if (builder.getLogging()) {
            s = new LoggingDocumentStoreWrapper(s);
        }
        if (builder.getReadOnlyMode()) {
            s = ReadOnlyDocumentStoreWrapperFactory.getInstance(s);
            readOnlyMode = true;
        } else {
            readOnlyMode = false;
        }
        this.executor = builder.getExecutor();
        this.clock = builder.getClock();

        int cid = builder.getClusterId();
        cid = Integer.getInteger("oak.documentMK.clusterId", cid);
        if (readOnlyMode) {
            clusterNodeInfo = ClusterNodeInfo.getReadOnlyInstance(s);
        } else {
            clusterNodeInfo = ClusterNodeInfo.getInstance(s, cid);
        }
        // TODO we should ensure revisions generated from now on
        // are never "older" than revisions already in the repository for
        // this cluster id
        cid = clusterNodeInfo.getId();

        if (builder.getLeaseCheck()) {
            s = new LeaseCheckDocumentStoreWrapper(s, clusterNodeInfo);
            clusterNodeInfo.setLeaseFailureHandler(builder.getLeaseFailureHandler());
        } else {
            clusterNodeInfo.setLeaseCheckDisabled(true);
        }

So the DocumentStore is passed into ClusterNodeInfo before it get's wrapped into a LeaseCheckDocumentStoreWrapper, no?

Comment by Julian Reschke [ 16/Jan/17 ]

But then, the lease update thread also does:

            // then, independently if the lease had to be updated or not, check
            // the status:
            if (nodeStore.updateClusterState()) {
                // then inform the discovery lite listener - if it is registered
                nodeStore.signalClusterStateChange();
            }

and, AFAICT, that indeed would go through the lease checking wrapper...

It's not hard to fix this, but it's not entirely clear what the best way is...

  • decouple updateClusterState from lease renew thread – probably good because the latter is timing critical
  • make sure LeaseChecker only delays calls to a whitelisted set of collections (just NODES?)
  • make sure that (some?) background threads always bypass the LeaseCheckWrapper

Also: is it really intentional that updateClusterState reads all cluster node infos once per second???

Stefan Egli Marcel Reutegger - feedback appreciated.

Comment by Julian Reschke [ 16/Jan/17 ]

Proposed patch.

Comment by Stefan Egli [ 25/Jan/17 ]

Code analysis shows that ClusterNodeInfo is handed the LeaseCheckDocumentStoreWrapper instance to use as store.

I second Julian Reschke's finding that ClusterNodeInfo gets the 'real' DocumentStore, not the lease wrapper. Stefan Eissing, do you disagree?

But then, the lease update thread also does [nodeStore.updateClusterState()]

good point, that one goes via the LeaseCheckDocumentStoreWrapper, while it probably should not, agreed.

It's not hard to fix this, but it's not entirely clear what the best way is...

I would try to come up with the least intrusive change. Which would mean to leave the lease wrapper as is (ie no whitelist or bypassing special cases). Ideally updateClusterState would be aware of the lease wrapper and pass the underlying (real) documentStore instead. Perhaps the DocumentNodeStore could keep track of that real documentStore for use in updateClusterState?

Also: is it really intentional that updateClusterState reads all cluster node infos once per second???

Yes, that is intentional. It is part of discovery lite where each instance checks the state of all the cluster node infos once a second to notice a crashing instance asap. We can discuss if the 1/second resolution is too much or if it should be done in a separate thread, but the discovery lite mechanism is based on this.

Comment by Stefan Eissing [ 25/Jan/17 ]

I second Julian Reschke's finding that ClusterNodeInfo gets the 'real' DocumentStore, not the lease wrapper. Stefan Eissing, do you disagree?

No, I agree. I saw in the logs that the lease update thread used the LeaseWrapper and was blocked. But the call is in the updateClusterState() that caused this.

So, we are all in agree and that this should be changed. As to how, I have no strong opinion. Perhaps the whole updateClusterState should go into the ClusterNodeInfo?

Comment by Julian Reschke [ 25/Jan/17 ]

Reminder: there's a patch available. The IMHO most important part is to separate the two operations into separate background threads.

Comment by Stefan Egli [ 25/Jan/17 ]

+1 for the patch. Separating the cluster update from the lease update - even though these two operation are about a related concern - is a good thing. (sorry, missed reviewing the patch previously)

Comment by Julian Reschke [ 25/Jan/17 ]

Stefan Egli & Marcel Reutegger and we are ok with this to be done before we branch 1.6, right?

Comment by Stefan Egli [ 25/Jan/17 ]

we are ok with this to be done before we branch 1.6

it is a change in a quite central part, so the question is if this is indeed a blocker for 1.6 or if it can't wait... We'd have to thoroughly test the fix well.

Comment by Stefan Egli [ 25/Jan/17 ]

Added a simplistic unit test that covers the retry loop (OAK-5446.testcase) - might be good enough as a start?

Comment by Julian Reschke [ 25/Jan/17 ]

Verified that the test tests the right thing by setting MAX_RETRY_SLEEPS_BEFORE_LEASE_FAILURE = 0 in ClusterNodeInfo (in which case the test fails).

Proposal: add the test as part of OAK-3399 (after removing trailing ws .

We still don't have a test that simulates the issue described in this ticket, though.

Comment by Julian Reschke [ 25/Jan/17 ]

Modified test that delays the read from clusterNodes and indeed reproduces the issue.

Comment by Stefan Egli [ 25/Jan/17 ]

ack, I'll add the test case to OAK-3399 (in trunk and 1.4 branch) and will look into how we can simulate a 'VM freeze' during lease update..

Comment by Stefan Egli [ 25/Jan/17 ]

Modified test that delays the read from clusterNodes and indeed reproduces the issue.

even better - can I leave the patch with you, Julian Reschke?

Comment by Stefan Egli [ 25/Jan/17 ]

just one comment: maybe we should have two flavours of the test: one with the delay and one without - as both cases seem useful.

Comment by Julian Reschke [ 25/Jan/17 ]

While the test currently proves that there is a problem, I'm not totally sure that the Thread.sleep is correct here – shouldn't I use the VirtualClock?

Comment by Stefan Egli [ 25/Jan/17 ]

I think actually delaying is correct, as a) the background thread doesn't use the virtual clock anyway and b) you want to simulate a 'vm freeze' for which you need to actually sleep.
But I think the delaying should only happen after the first successful renewal (to keep the idea of that test method) - so I've added a variant which enables delaying after forwarding the clock (OAK-5446.testcase.v3)

Comment by Julian Reschke [ 26/Jan/17 ]

...guess what: in the proposed patch I copied (instead of moved) the cluster update check...

Comment by Stefan Egli [ 26/Jan/17 ]

oups, yes, that would not have helped then..

Comment by Julian Reschke [ 26/Jan/17 ]

See r1780424 - for some I reason I currently can not resolve this ticket though.

Comment by Julian Reschke [ 26/Jan/17 ]

Note I had to clone the issue to work around a JIRA problem.

Generated at Thu Jan 26 16:59:12 UTC 2017 by Julian Reschke using JIRA 6.3.4#6332-sha1:51bc225ef474afe3128b2f66878477f322397b16.