Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
This scenario is found when introducing GII test case for PR clear. The sequence is:
(1) there're 3 servers, server1 is accessor, server2 and server3 are datastores.
(2) shutdown server2
(3) send PR clear from server1 (accessor) and restart server2 at the same time. There's a race that server2 did not receive the PartitionedRegionClearMessage.
(4) server2 finished GII
(5) only server3 received PartitionedRegionClearMessage and it hosts all the primary buckets. When PR clear thread iterates through these primary buckets one by one, some of them might lose primary to server2.
(6) BR.cmnClearRegion will return immediately since it's no longer primary, but clearedBuckets.add(localPrimaryBucketRegion.getId()); will still be called. So from the caller point of view, this bucket is cleared. It wouldn't even throw PartitionedRegionPartialClearException.
The problem is:
before calling cmnClearRegion, we should call BR.doLockForPrimary to make sure it's still primary. If not, throw exception. Then clearedBuckets.add(localPrimaryBucketRegion.getId()); will not be called for this bucket.
The expected behavior in this scenario is to throw PartitionedRegionPartialClearException.
Attachments
Issue Links
- links to