Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
The distributed lock service is configured to avoid releasing locks while the cache is closing. If the cache that is closing has any primary bucket locks this can delay cache operations on those buckets until the cache is completely closed and the DistributedSystem is disconnected.
I've seen this take over 30 seconds, causing client connections to be timed out on the server-side and clients failing over from one server to another only to be blocked by the same issue in those servers.
Another thing I observed at the same time is that AcceptorImpl is sending profile updates for all partitioned regions. Those profile updates take as long as 2 seconds apiece to process. This also delays election of new primary bucket owners and it's unnecessary since
DestroyPartitionRegion messages are later sent that remove the profiles in other servers.