Uploaded image for project: 'ActiveMQ Artemis'
  1. ActiveMQ Artemis
  2. ARTEMIS-4522

Flow control may get delayed significantly on occupied pool

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 2.32.0
    • None
    • None

    Description

      Slack discussion with jbertram see here:
      https://the-asf.slack.com/archives/CFL910J30/p1699520764243169

      Initial Description

      consumers-window-size set to 0 and there some queues, that are heavily used and have consumers attached, that allow for a concurrency of 500 messages (larger than thread-pool-client-size).
      After some time - unclear what triggers it - the behavior becomes bad in the way that from those queues only 1-5 messages are fetched and immediately processed, but then there is a pause of 10 seconds before the next 1-5 messages are fetched. This continues.
      In the thread-dumps, several like the following can be seen:

      "Thread-0 (ActiveMQ-client-global-threads)" #640 daemon prio=5 os_prio=0 cpu=899719.36ms elapsed=354425.44s tid=0x00007f904c02e850 nid=0x150d29 runnable  [0x00007f8fdbae5000]
         java.lang.Thread.State: TIMED_WAITING (parking)
      	at jdk.internal.misc.Unsafe.park(java.base@17.0.5/Native Method)
      	- parking to wait for  <0x00000003f6c9d1d0> (a org.apache.activemq.artemis.utils.AbstractLatch$CountSync)
      	at java.util.concurrent.locks.LockSupport.parkNanos(java.base@17.0.5/LockSupport.java:252)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@17.0.5/AbstractQueuedSynchronizer.java:717)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@17.0.5/AbstractQueuedSynchronizer.java:1074)
      	at org.apache.activemq.artemis.utils.AbstractLatch.await(AbstractLatch.java:115)
      	at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.startSlowConsumer(ClientConsumerImpl.java:869)
      	at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:1025)
      	at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:1154)
      	at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
      	at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
      	at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
      	at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$2008/0x00000007ce20c000.run(Unknown Source)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.5/ThreadPoolExecutor.java:1136)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.5/ThreadPoolExecutor.java:635)
      	at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
      

      which corresponds to code location with a 10 seconds sleep here:
      https://github.com/apache/activemq-artemis/blob/main/artemis-core-client/src/main/java/org/apache/activemq/artemis/core/client/impl/ClientConsumerImpl.java#L854
      It seems it always needs to wait until the 10 seconds are reached, which is unexpected, since the latch should count-down before, after having sent the credit to the server. We could assume some bug, that maybe increments the latch, but misses to decrement, so it is not able to reach 0 ever again, but that's speculative.

      Evaluation

      See slack for details, it boils down to the thread-pool in use being out of free threads, so the slow-consumer-handling is not able to run in good time.

      Affected versions: all since at least 2.22.0 up to current - probably since "ever", involved code derives from HornetQ.

      Possible Solutions

      We experimented a bit and found a solution for us - or actually 2 variants.

      and jbertram said:

      I think either one would be fine. Create a Jira and send a PR.

      So here is the Jira and I will push No. 2 as a pull-request: https://github.com/apache/activemq-artemis/pull/4699

      Attachments

        Activity

          People

            Unassigned Unassigned
            riconeubauer Rico Neubauer
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h 40m
                2h 40m