Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-8204

CI Failure: RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion.luceneQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion hang

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.10.0
    • None
    • None

    Description

      https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/UpgradeTestOpenJDK8/builds/217#A run timed out.
      The dunit-hangs.txt shows two tests not finished, and I think the ClientServerMiscBCDUnitTest giiEventQueueFromOldToCurrentMemberShouldSucceed is not really hang. It is just a little hard to read the stack dumps as they are for both test runs.
      Started @ 2020-05-29 14:11:25.687 +0000
      2020-05-29 14:59:03.911 +0000 org.apache.geode.internal.cache.tier.sockets.ClientServerMiscBCDUnitTest giiEventQueueFromOldToCurrentMemberShouldSucceed[12]
      Ended @ 2020-05-29 14:59:03.911 +0000

      Started @ 2020-05-29 14:11:25.635 +0000
      2020-05-29 14:46:47.615 +0000 org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion luceneQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion[from_v1.9.2,
      Ended @ 2020-05-29 14:56:25.635 +0000

      There are multiple DLock requesters for LockGrantor during PartitionedRepositoryManager.computeRepository().

      "Pooled Waiting Message Processor 40" #163 daemon prio=5 os_prio=0 tid=0x00007fc94c043000 nid=0x955c waiting on condition [0x00007fc9d07ad000]
         java.lang.Thread.State: TIMED_WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for  <0x00000000e351c5c0> (a java.util.concurrent.CountDownLatch$Sync)
              at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
              at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
              at org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
              at org.apache.geode.internal.util.concurrent.FutureResult.get(FutureResult.java:69)
              at org.apache.geode.distributed.internal.locks.DLockService.waitForLockGrantorFutureResult(DLockService.java:750)
              at org.apache.geode.distributed.internal.locks.DLockService.getLockGrantorId(DLockService.java:301)
              at org.apache.geode.distributed.internal.locks.DLockService.lockInterruptibly(DLockService.java:1445)
              at org.apache.geode.distributed.internal.locks.DLockService.lock(DLockService.java:1241)
              at org.apache.geode.distributed.internal.locks.DLockService.lock(DLockService.java:1232)
              at org.apache.geode.distributed.internal.locks.DLockService.lock(DLockService.java:1227)
              at org.apache.geode.distributed.internal.locks.DLockService.lock(DLockService.java:1223)
              at org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.finishComputingRepository(IndexRepositoryFactory.java:105)
              at org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.computeIndexRepository(IndexRepositoryFactory.java:65)
              at org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager.computeRepository(PartitionedRepositoryManager.java:151)
              at org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager.lambda$computeRepository$1(PartitionedRepositoryManager.java:170)
              at org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager$$Lambda$308/437574105.apply(Unknown Source)
              at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
              - locked <0x00000000e35149f0> (a java.util.concurrent.ConcurrentHashMap$ReservationNode)
              at org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager.computeRepository(PartitionedRepositoryManager.java:162)
              at org.apache.geode.cache.lucene.internal.LuceneBucketListener.lambda$afterPrimary$0(LuceneBucketListener.java:40)
              at org.apache.geode.cache.lucene.internal.LuceneBucketListener$$Lambda$307/1978619753.run(Unknown Source)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at org.apache.geode.distributed.internal.ClusterOperationExecutors.runUntilShutdown(ClusterOperationExecutors.java:442)
              at org.apache.geode.distributed.internal.ClusterOperationExecutors.doWaitingThread(ClusterOperationExecutors.java:411)
              at org.apache.geode.distributed.internal.ClusterOperationExecutors$$Lambda$86/1114445410.invoke(Unknown Source)
              at org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119)
              at org.apache.geode.logging.internal.executors.LoggingThreadFactory$$Lambda$82/1488044985.run(Unknown Source)
              at java.lang.Thread.run(Thread.java:748)
      
         Locked ownable synchronizers:
              - <0x00000000e3596b58> (a java.util.concurrent.ThreadPoolExecutor$Worker)
      
      

      The remote process that should processing the requests are being closed and is wait for GrantorRequestProcessor.clearGrantor to finish:

      "RMI TCP Connection(1)-172.17.0.30" #32 daemon prio=5 os_prio=0 tid=0x00007f8178001800 nid=0x939b waiting on condition [0x00007f826f269000]
         java.lang.Thread.State: TIMED_WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for  <0x00000000e1982990> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
              at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)
              at java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:442)
              at org.apache.geode.internal.util.concurrent.StoppableReentrantLock.lockInterruptibly(StoppableReentrantLock.java:85)
              at org.apache.geode.internal.util.concurrent.StoppableReentrantLock.lock(StoppableReentrantLock.java:71)
              at org.apache.geode.distributed.internal.ClusterElderManager.initializeElderState(ClusterElderManager.java:102)
              at org.apache.geode.distributed.internal.ClusterElderManager.getElderState(ClusterElderManager.java:97)
              at org.apache.geode.distributed.internal.ClusterDistributionManager.getElderState(ClusterDistributionManager.java:2091)
              at org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.startElderCall(GrantorRequestProcessor.java:227)
              at org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.basicOp(GrantorRequestProcessor.java:336)
              at org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.basicOp(GrantorRequestProcessor.java:319)
              at org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.clearGrantor(GrantorRequestProcessor.java:310)
              at org.apache.geode.distributed.internal.locks.DLockService.clearGrantor(DLockService.java:874)
              at org.apache.geode.distributed.internal.locks.DLockGrantor.destroy(DLockGrantor.java:1249)
              - locked <0x00000000fbd65f58> (a org.apache.geode.distributed.internal.locks.DLockGrantor)
              at org.apache.geode.distributed.internal.locks.DLockService.nullLockGrantorId(DLockService.java:649)
              at org.apache.geode.distributed.internal.locks.DLockService.basicDestroy(DLockService.java:2336)
              at org.apache.geode.distributed.internal.locks.DLockService.destroyAndRemove(DLockService.java:2256)
              - locked <0x00000000fbc85130> (a java.lang.Object)
              at org.apache.geode.distributed.internal.locks.DLockService.destroyServiceNamed(DLockService.java:2195)
              at org.apache.geode.distributed.DistributedLockService.destroy(DistributedLockService.java:84)
              at org.apache.geode.internal.cache.GemFireCacheImpl.destroyPartitionedRegionLockService(GemFireCacheImpl.java:2042)
              at org.apache.geode.internal.cache.GemFireCacheImpl.doClose(GemFireCacheImpl.java:2296)
              - locked <0x00000000e2282030> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl)
              at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2117)
              at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1985)
              at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1975)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:498)
              at org.apache.geode.cache.lucene.LuceneSearchWithRollingUpgradeTestBase.closeCache(LuceneSearchWithRollingUpgradeTestBase.java:755)
              at org.apache.geode.cache.lucene.LuceneSearchWithRollingUpgradeTestBase$12.run2(LuceneSearchWithRollingUpgradeTestBase.java:568)
              at org.apache.geode.cache30.CacheSerializableRunnable.run(CacheSerializableRunnable.java:55)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:498)
              at org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123)
              at org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:78)
              at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:498)
              at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
              at sun.rmi.transport.Transport$1.run(Transport.java:200)
              at sun.rmi.transport.Transport$1.run(Transport.java:197)
              at java.security.AccessController.doPrivileged(Native Method)
              at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
              at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
              at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
              at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
              at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$15/1032579838.run(Unknown Source)
              at java.security.AccessController.doPrivileged(Native Method)
              at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      
         Locked ownable synchronizers:
              - <0x00000000fb3d0db0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
              - <0x00000000fbec63a8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
      

      The thread holding the lock is still in ElderInitProcessor.init. I can not tell why it is not able to finish it.

      "Pooled Message Processor 2" #124 daemon prio=10 os_prio=0 tid=0x00007f8100034000 nid=0x9401 waiting on condition [0x00007f82648cc000]
         java.lang.Thread.State: TIMED_WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for  <0x00000000fb403e98> (a java.util.concurrent.CountDownLatch$Sync)
              at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
              at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
              at org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
              at org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:731)
              at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:802)
              at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:779)
              at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:865)
              at org.apache.geode.distributed.internal.locks.ElderInitProcessor.init(ElderInitProcessor.java:71)
              at org.apache.geode.distributed.internal.locks.ElderState.<init>(ElderState.java:53)
              at org.apache.geode.distributed.internal.ClusterElderManager.lambda$new$0(ClusterElderManager.java:41)
              at org.apache.geode.distributed.internal.ClusterElderManager$$Lambda$76/1497552839.get(Unknown Source)
              at org.apache.geode.distributed.internal.ClusterElderManager.initializeElderState(ClusterElderManager.java:106)
              at org.apache.geode.distributed.internal.ClusterElderManager.getElderState(ClusterElderManager.java:97)
              at org.apache.geode.distributed.internal.ClusterDistributionManager.getElderState(ClusterDistributionManager.java:2091)
              at org.apache.geode.distributed.internal.locks.GrantorRequestProcessor$GrantorRequestMessage.basicProcess(GrantorRequestProcessor.java:504)
              at org.apache.geode.distributed.internal.locks.GrantorRequestProcessor$GrantorRequestMessage.process(GrantorRequestProcessor.java:497)
              at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:376)
              at org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:441)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at org.apache.geode.distributed.internal.ClusterOperationExecutors.runUntilShutdown(ClusterOperationExecutors.java:442)
              at org.apache.geode.distributed.internal.ClusterOperationExecutors.doProcessingThread(ClusterOperationExecutors.java:389)
              at org.apache.geode.distributed.internal.ClusterOperationExecutors$$Lambda$81/548873202.invoke(Unknown Source)
              at org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119)
              at org.apache.geode.logging.internal.executors.LoggingThreadFactory$$Lambda$82/1455748764.run(Unknown Source)
              at java.lang.Thread.run(Thread.java:748)
      
         Locked ownable synchronizers:
              - <0x00000000fb437ee0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
              - <0x00000000e1982990> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            eshu Eric Shu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: