Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-9075

Thread stuck indefinitely when using Istio/Sidecar

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • None

    Description

      Geode cluster is deployed in kubernetes environment, and Istio/SideCars are injected between cluster members. While running traffic, if any Istio/SideCar is restarted, thread will get stuck indefinitely, while waiting for reply on sent message.

      After detail analysis, it seams that due to restarting of proxy, in some cases, message is lost, and sending side is waiting indefinitely for reply. What can be seen on sending side, is reception of "reset connection" or "EOF" on sending socket after message is sent.

       

      [warn 2021/03/25 21:04:47.282 CET server2 <ThreadsMonitor> tid=0x12] Thread <64> (0x40) that was executed at <25 Mar 2021 21:03:53 CET> has been stuck for <53.897 seconds> and number of thread monitor iteration <1>
      Thread Name <Function Execution Processor2> state <TIMED_WAITING>
      Waiting on <java.util.concurrent.CountDownLatch$Sync@7c7f9898>
      Executor Group <FunctionExecutionPooledExecutor>
      Monitored metric <ResourceManagerStats.numThreadsStuck>
      Thread stack:
      sun.misc.Unsafe.park(Native Method)
      java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
      java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
      java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
      java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
      org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
      org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:736)
      org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:811)
      org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:784)
      org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:874)
      org.apache.geode.internal.cache.DistributedCacheOperation.waitForAckIfNeeded(DistributedCacheOperation.java:811)
      org.apache.geode.internal.cache.DistributedCacheOperation._distribute(DistributedCacheOperation.java:699)
      org.apache.geode.internal.cache.DistributedCacheOperation.startOperation(DistributedCacheOperation.java:277)
      org.apache.geode.internal.cache.DistributedCacheOperation.distribute(DistributedCacheOperation.java:318)
      org.apache.geode.internal.cache.DistributedRegion.distributeUpdate(DistributedRegion.java:520)

      ...

      Attachments

        Issue Links

          Activity

            People

              mivanac Mario Ivanac
              mivanac Mario Ivanac
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: