Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-6517

Race condition exists that a node failed to be shutdown as it is stuck on PRHARedundancyProvider.waitForPersistentBucketRecovery()

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.1.0
    • 1.9.0
    • regions
    • None

    Description

      The hang thread stack:
      "Shutdown Disconnector1" #93 prio=10 os_prio=0 tid=0x00007f84b8002800 nid=0x6875 waiting on condition [0x00007f844ee31000]
      java.lang.Thread.State: WAITING (parking)
      at sun.misc.Unsafe.park(Native Method)

      • parking to wait for <0x00000000f14f0490> (a java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
        at org.apache.geode.internal.cache.PRHARedundancyProvider.waitForPersistentBucketRecovery(PRHARedundancyProvider.java:2019)
        at org.apache.geode.internal.cache.PartitionedRegion.postDestroyRegion(PartitionedRegion.java:7536)
        at org.apache.geode.internal.cache.LocalRegion.recursiveDestroyRegion(LocalRegion.java:2707)
        at org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6308)
        at org.apache.geode.internal.cache.LocalRegion.handleCacheClose(LocalRegion.java:7387)
        at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2281)
      • locked <0x00000000f0abeb00> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl)
        at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1593)
      • locked <0x00000000f0abeb00> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl)
        at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1255)
        at org.apache.geode.management.internal.cli.functions.ShutDownFunction.lambda$disconnectInNonDaemonThread$0(ShutDownFunction.java:78)
        at org.apache.geode.management.internal.cli.functions.ShutDownFunction$$Lambda$94/665093117.run(Unknown Source)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

      The race occurs during recoverPersistentBuckets, between following latch is created and then nulled out, shutdown thread could get hold of the reference of latch and wait for countDown forever.
      allBucketsRecoveredFromDisk = new CountDownLatch(proxyBucketArray.length);
      try {
      if (proxyBucketArray.length > 0)

      { this.redundancyLogger = new RedundancyLogger(this); Thread loggingThread = new LoggingThread( "RedundancyLogger for region " + this.prRegion.getName(), false, this.redundancyLogger); loggingThread.start(); }

      } catch (RuntimeException e)

      { allBucketsRecoveredFromDisk = null; throw e; }

      Attachments

        Issue Links

          Activity

            People

              eshu Eric Shu
              eshu Eric Shu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h