Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-7665 Ability to clear a Partitioned Region
  3. GEODE-9191

PR clear could miss clearing bucket which lost primary

    XMLWordPrintableJSON

Details

    Description

      This scenario is found when introducing GII test case for PR clear. The sequence is:

      (1) there're 3 servers, server1 is accessor, server2 and server3 are datastores.
      (2) shutdown server2
      (3) send PR clear from server1 (accessor) and restart server2 at the same time. There's a race that server2 did not receive the PartitionedRegionClearMessage.
      (4) server2 finished GII
      (5) only server3 received PartitionedRegionClearMessage and it hosts all the primary buckets. When PR clear thread iterates through these primary buckets one by one, some of them might lose primary to server2.
      (6) BR.cmnClearRegion will return immediately since it's no longer primary, but clearedBuckets.add(localPrimaryBucketRegion.getId()); will still be called. So from the caller point of view, this bucket is cleared. It wouldn't even throw PartitionedRegionPartialClearException.

      The problem is:
      before calling cmnClearRegion, we should call BR.doLockForPrimary to make sure it's still primary. If not, throw exception. Then clearedBuckets.add(localPrimaryBucketRegion.getId()); will not be called for this bucket.
      The expected behavior in this scenario is to throw PartitionedRegionPartialClearException.

      Attachments

        Issue Links

          Activity

            People

              zhouxj Xiaojian Zhou
              zhouxj Xiaojian Zhou
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: