Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
A system was found hung with a thread in this state:
vm_0_bridge1_w1-gst-dev23_17330:ServerConnection on port 21566 Thread 352 ID=861 state=TIMED_WAITING waiting to lock <java.util.concurrent.CountDownLatch$Sync@41c71f7d> at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) at com.gemstone.gemfire.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:55) at com.gemstone.gemfire.internal.util.concurrent.FutureResult.get(FutureResult.java:54) at com.gemstone.gemfire.distributed.internal.locks.DLockService.waitForLockGrantorFutureResult(DLockService.java:774) at com.gemstone.gemfire.distributed.internal.locks.DLockService.notLockGrantorId(DLockService.java:837) at com.gemstone.gemfire.distributed.internal.locks.DLockService.releaseTryLocks(DLockService.java:2216) at com.gemstone.gemfire.internal.cache.locks.TXLockServiceImpl.release(TXLockServiceImpl.java:222) locked <java.util.ArrayList@34bbf1cf> at com.gemstone.gemfire.internal.cache.TXLockRequest.releaseDistributed(TXLockRequest.java:91) at com.gemstone.gemfire.internal.cache.TXLockRequest.cleanup(TXLockRequest.java:120) at com.gemstone.gemfire.internal.cache.TXState.cleanup(TXState.java:730) at com.gemstone.gemfire.internal.cache.TXState.commit(TXState.java:447) at com.gemstone.gemfire.internal.cache.TXStateProxyImpl.commit(TXStateProxyImpl.java:234) at com.gemstone.gemfire.internal.cache.TXManagerImpl.commit(TXManagerImpl.java:325)
No other threads were trying to find the lock grantor and further testing showed that the FutureResult that this thread was using must have been cancelled by another thread. I wrote a unit test and found that FutureResult is not respecting its cancelled state in its get() methods.
Commit 8ace5128f4c33db4df3acef888d975cb08a3601f in geode's branch refs/heads/develop from bschuchardt
[ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=8ace512 ]
GEODE-2228FutureResult.get() does not check for cancellation prior to waiting for a resultAdding cancellation checks to the get() methods in FutureResult. This
prevents the TXLockService from hanging waiting for a FutureResult from
another thread when none will ever appear.