Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-10286

cache close in response to a forced disconnect with persistent regions may skip some cleanup

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      During a cache close, persistent regions may not cleanup as much as they should. This is because when the PersistentAdvisor is closed, CancelException is not handled causing other parts of the close to be skipped. I think the place to handle it is: DistributedRegion.distributedRegionCleanup(DistributedRegion.java:2564). Here is an exception showing what it looks like when this happens:

      org.apache.geode.distributed.DistributedSystemDisconnectedException: Distribution manager on rs-RunItNow-ZH1504a1i3xlarge-hydra-client-10(dataStor
      egemfire2_host1_421:421)<ec><v22>:41004 started at Wed Mar 23 17:11:48 PDT 2022: Member isn't responding to heartbeat requests, caused by org.apac
      he.geode.ForcedDisconnectException: Member isn't responding to heartbeat requests
              at org.apache.geode.distributed.internal.ClusterDistributionManager$Stopper.generateCancelledException(ClusterDistributionManager.java:289
      3)
              at org.apache.geode.distributed.internal.InternalDistributedSystem$Stopper.generateCancelledException(InternalDistributedSystem.java:1177)
              at org.apache.geode.CancelCriterion.checkCancelInProgress(CancelCriterion.java:83)
              at org.apache.geode.distributed.internal.ClusterElderManager.getElderId(ClusterElderManager.java:76)
              at org.apache.geode.distributed.internal.ClusterDistributionManager.getElderId(ClusterDistributionManager.java:2085)
              at org.apache.geode.distributed.internal.locks.DLockService.getElderId(DLockService.java:254)
              at org.apache.geode.distributed.internal.locks.DLockService.notLockGrantorId(DLockService.java:824)
              at org.apache.geode.distributed.internal.locks.DLockService.unlock(DLockService.java:1807)
              at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.releaseTieLock(PersistenceAdvisorImpl.java:1181)
              at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.close(PersistenceAdvisorImpl.java:1158)
              at org.apache.geode.internal.cache.DistributedRegion.distributedRegionCleanup(DistributedRegion.java:2564)
              at org.apache.geode.internal.cache.DistributedRegion.postDestroyRegion(DistributedRegion.java:2657)
              at org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2732)
              at org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6241)
              at org.apache.geode.internal.cache.DistributedRegion.basicDestroyRegion(DistributedRegion.java:1834)
              at org.apache.geode.internal.cache.LocalRegion.handleCacheClose(LocalRegion.java:7320)
              at org.apache.geode.internal.cache.DistributedRegion.handleCacheClose(DistributedRegion.java:2691)
              at org.apache.geode.internal.cache.GemFireCacheImpl.doClose(GemFireCacheImpl.java:2308)
              at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2154)
              at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1538)
              at org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2545)
              at org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2408)
              at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1254)
              at org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:2329)
              at org.apache.geode.distributed.internal.membership.gms.GMSMembership.uncleanShutdown(GMSMembership.java:1190)
              at org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.lambda$uncleanShutdownDS$0(GMSMembership.java:1793)
              at java.base/java.lang.Thread.run(Thread.java:833)
      Caused by: org.apache.geode.ForcedDisconnectException: Member isn't responding to heartbeat requests
              at org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:2319)
              ... 3 more
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jinmeiliao Jinmei Liao
            dschneider Darrel Schneider
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment