Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 5.5.0
    • Fix Version/s: NEEDS_REVIEW
    • Component/s: Broker
    • Labels:
      None
    • Environment:

      Linux and Windows

      Description

      Sometimes the messages stuck in queue and after restart of activemq then this messages redeliver to consumer. There are a lot of of topics:

      http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

      and this some of them:

      http://activemq.2283324.n4.nabble.com/Stuck-messages-Dispatch-issues-td2367852.html
      http://activemq.2283324.n4.nabble.com/Messages-stuck-in-queue-td3244342.html

      1. debug.zip
        333 kB
        Edwin Yu
      2. activemq-jdbc.xml
        4 kB
        Erkan
      3. activemq.xml
        2 kB
        Erkan
      4. activemq.log
        427 kB
        Erkan

        Activity

        Hide
        Gary Tully added a comment -

        removing this from the 5.8 critical list (blocking a release) pending a test case that can reproduce.

        Show
        Gary Tully added a comment - removing this from the 5.8 critical list (blocking a release) pending a test case that can reproduce.
        Hide
        Edwin Yu added a comment -

        Hi Claus, I spent a day writing a unit test. The unit test performed very well and seemed solid. The issue only happens in our production system, where we also embedded Mule, Quartz, Spring, Hibernate, etc. It's hard to simulate the right environment in a unit test. Also, when I allocated more memory to the JVM, and switched to use parallel gc, the issue seemed to happen lot less frequently. I tend to think that when a long full GC kicks in and application threads are stopped momentarily, it may trigger this issue to occur, but I cannot prove it.
        Regards.

        Show
        Edwin Yu added a comment - Hi Claus, I spent a day writing a unit test. The unit test performed very well and seemed solid. The issue only happens in our production system, where we also embedded Mule, Quartz, Spring, Hibernate, etc. It's hard to simulate the right environment in a unit test. Also, when I allocated more memory to the JVM, and switched to use parallel gc, the issue seemed to happen lot less frequently. I tend to think that when a long full GC kicks in and application threads are stopped momentarily, it may trigger this issue to occur, but I cannot prove it. Regards.
        Hide
        Claus Ibsen added a comment -

        Any update on a test case?

        Show
        Claus Ibsen added a comment - Any update on a test case?
        Hide
        Edwin Yu added a comment -

        @Gary, yes, org.apache.activemq.broker.region.cursors.AbstractStoreCursor#storeHasMessages is false in my case. When I manually set this variable to true in my debugging session, the execution continued into KahaDBStore but didn't get any message back from the BTree (in sd.orderIndex).
        I will try to create a test case hopefully to reproduce this issue easily.
        Best regards.

        Show
        Edwin Yu added a comment - @Gary, yes, org.apache.activemq.broker.region.cursors.AbstractStoreCursor#storeHasMessages is false in my case. When I manually set this variable to true in my debugging session, the execution continued into KahaDBStore but didn't get any message back from the BTree (in sd.orderIndex). I will try to create a test case hopefully to reproduce this issue easily. Best regards.
        Hide
        Gary Tully added a comment -

        @Edwin, if you can produce a junit test case that reproduces >60% of the time, it would be a great start.

        Show
        Gary Tully added a comment - @Edwin, if you can produce a junit test case that reproduces >60% of the time, it would be a great start.
        Hide
        Gary Tully added a comment -

        @Edwin

        you are in the right space. is it org.apache.activemq.broker.region.cursors.AbstractStoreCursor#storeHasMessages that is false?

        Show
        Gary Tully added a comment - @Edwin you are in the right space. is it org.apache.activemq.broker.region.cursors.AbstractStoreCursor#storeHasMessages that is false?
        Hide
        Edwin Yu added a comment -

        Hello, I encountered this issue as well with ActiveMQ 5.6.0. I attached some debugging screenshots to this issue.
        In my case, the messages are consumed fine during normal operation. It seems when JVM becomes busy doing gc, message(s) can be left in the queue forever until the server is restarted. It's not 100% reproducible but does occur. In the JConsole screenshot, you see ConsumerCount=1, InFlightCount=0, and QueueSize=1. A message is left unconsumed.
        I debugged the ActiveMQ library code. Please see the debug-1.png and debug-2.png. A thread regularly iterates through the queue, finds the pending messages, and tries to page in messages from kaha db store by filling a batch. However, AbstractStoreCursor.fillBatch() doesn't execute doFillBatch(). And the execution thread finishes without dispatching the pending message to the consumer.
        I don't know what the expected behavior should be, and I don't know if I were debugging the wrong part of the ActiveMQ code. ActiveMQ engineers, please let me know where else I should debug to help identify the problem. Thanks.

        Show
        Edwin Yu added a comment - Hello, I encountered this issue as well with ActiveMQ 5.6.0. I attached some debugging screenshots to this issue. In my case, the messages are consumed fine during normal operation. It seems when JVM becomes busy doing gc, message(s) can be left in the queue forever until the server is restarted. It's not 100% reproducible but does occur. In the JConsole screenshot, you see ConsumerCount=1, InFlightCount=0, and QueueSize=1. A message is left unconsumed. I debugged the ActiveMQ library code. Please see the debug-1.png and debug-2.png. A thread regularly iterates through the queue, finds the pending messages, and tries to page in messages from kaha db store by filling a batch. However, AbstractStoreCursor.fillBatch() doesn't execute doFillBatch(). And the execution thread finishes without dispatching the pending message to the consumer. I don't know what the expected behavior should be, and I don't know if I were debugging the wrong part of the ActiveMQ code. ActiveMQ engineers, please let me know where else I should debug to help identify the problem. Thanks.
        Hide
        Edwin Yu added a comment -

        Attach screenshot of my code debugging session.

        Show
        Edwin Yu added a comment - Attach screenshot of my code debugging session.
        Hide
        Erkan added a comment -

        In activemq xml we refence to the activemq jdbc.xml. The both configuration files are from server side not client side.

        Show
        Erkan added a comment - In activemq xml we refence to the activemq jdbc.xml. The both configuration files are from server side not client side.
        Hide
        Johno Crawford added a comment -

        Can you provide your activemq.xml?

        Show
        Johno Crawford added a comment - Can you provide your activemq.xml?
        Hide
        Erkan added a comment -

        I uploded the log file with which 124 messages stuck. The messages stuck only after a failover process.

        Show
        Erkan added a comment - I uploded the log file with which 124 messages stuck. The messages stuck only after a failover process.
        Hide
        Erkan added a comment -

        By this log file 124 messages was stuck.

        Show
        Erkan added a comment - By this log file 124 messages was stuck.
        Hide
        Gary Tully added a comment -

        I wonder if this is your problem https://issues.apache.org/jira/browse/AMQ-3568

        Show
        Gary Tully added a comment - I wonder if this is your problem https://issues.apache.org/jira/browse/AMQ-3568
        Hide
        Erkan added a comment -

        Gary,

        I will do that until friday and then upload it.

        Show
        Erkan added a comment - Gary, I will do that until friday and then upload it.
        Hide
        Gary Tully added a comment -

        Erkan, can you attach the log files from a 5.6-SNAPSHOT.
        Add the following to your log4j.properties:

        log4j.logger.org.apache.activemq.broker.region.cursors.AbstractStoreCursor=TRACE
        log4j.logger.org.apache.activemq.broker.region.PrefetchSubscription=TRACE
        Show
        Gary Tully added a comment - Erkan, can you attach the log files from a 5.6-SNAPSHOT. Add the following to your log4j.properties: log4j.logger.org.apache.activemq.broker.region.cursors.AbstractStoreCursor=TRACE log4j.logger.org.apache.activemq.broker.region.PrefetchSubscription=TRACE
        Hide
        Erkan added a comment - - edited

        I tested it with the 5.6-SNAPSHOT and it is the same effect. I get the following warnings:

        WARN | Transport failed: java.io.EOFException
        WARN | Transport failed: java.net.SocketException: Connection reset | org.apache.activemq.broker.TransportConnection.Transport
        WARN | Transport failed: java.net.SocketException: Software caused connection abort: socket write error | org.apache.activemq.broker.TransportConnection.Transport Async Exception Handler
        WARN | Duplicate message add attempt rejected. Destination:

        Can be the duplicate messages the reason so that the duplicate messages can not be consume so they stuck in queue?

        Show
        Erkan added a comment - - edited I tested it with the 5.6-SNAPSHOT and it is the same effect. I get the following warnings: WARN | Transport failed: java.io.EOFException WARN | Transport failed: java.net.SocketException: Connection reset | org.apache.activemq.broker.TransportConnection.Transport WARN | Transport failed: java.net.SocketException: Software caused connection abort: socket write error | org.apache.activemq.broker.TransportConnection.Transport Async Exception Handler WARN | Duplicate message add attempt rejected. Destination: Can be the duplicate messages the reason so that the duplicate messages can not be consume so they stuck in queue?
        Hide
        Sree Panchajanyam D added a comment -

        Do you notice any errors on ActiveMQ or consumer?

        Show
        Sree Panchajanyam D added a comment - Do you notice any errors on ActiveMQ or consumer?
        Hide
        Gary Tully added a comment -

        you have got to ask the computer. Do a test with that new option enabled using a 5.6-SNAPSHOT

        Show
        Gary Tully added a comment - you have got to ask the computer. Do a test with that new option enabled using a 5.6-SNAPSHOT
        Hide
        Erkan added a comment -
        Show
        Erkan added a comment - Can be that the reason? https://issues.apache.org/jira/browse/AMQ-2484
        Hide
        Erkan added a comment -

        I extend it but that didn't help.

        Show
        Erkan added a comment - I extend it but that didn't help.
        Hide
        Sree Panchajanyam D added a comment -

        if you are starting the system as a java process from a .sh or a .bat supply xms and xmx parameters. http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html this could yu better info.
        If you are starting it from an IDE like eclipse supply VM parameters in run configurations.
        I am not sure about .Net clients.

        Show
        Sree Panchajanyam D added a comment - if you are starting the system as a java process from a .sh or a .bat supply xms and xmx parameters. http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html this could yu better info. If you are starting it from an IDE like eclipse supply VM parameters in run configurations. I am not sure about .Net clients.
        Hide
        Erkan added a comment -

        Can you please say how we can increase the consumer heap size?

        Show
        Erkan added a comment - Can you please say how we can increase the consumer heap size?
        Hide
        Sree Panchajanyam D added a comment -

        Malcolm it would be great if you could report these two bugs on a different issue/thread. These are diffent from messages not getting dequeued.

        Show
        Sree Panchajanyam D added a comment - Malcolm it would be great if you could report these two bugs on a different issue/thread. These are diffent from messages not getting dequeued.
        Hide
        Sree Panchajanyam D added a comment -

        Please increase the consumer heap size to solve this issue. If you have huge number of producers run concurrent consumers.

        Show
        Sree Panchajanyam D added a comment - Please increase the consumer heap size to solve this issue. If you have huge number of producers run concurrent consumers.
        Hide
        Pavithra added a comment -

        We see a similar problem.

        Show
        Pavithra added a comment - We see a similar problem.
        Hide
        malcolm davis added a comment -

        We see a similar problem.

        Queue Block: (only seen in production, unable to recreate)

        • Producer: Transactional, Session.CLIENT_ACKNOWLEDGE
        • Queue: Single consumer thread reading the queue.
        • The queue blocks when under load (receive()) does not work. We are not sure if this is a memory or CPU issue.
        • The queue will eventually recover.

        Queue going to a single consumer on restart: (easy to recreate)

        • Create a queue with multiple consumers.
        • Stick a lot of messages on the queue, with consumers slowly pulling messages.
        • Restart ActiveMQ while messages are still on the queue.
        • All existing messages on the queue go to a single consumer.
        • New messages pushed to the queue are distributed to the other consumers as expected.
        Show
        malcolm davis added a comment - We see a similar problem. Queue Block: (only seen in production, unable to recreate) Producer: Transactional, Session.CLIENT_ACKNOWLEDGE Queue: Single consumer thread reading the queue. The queue blocks when under load (receive()) does not work. We are not sure if this is a memory or CPU issue. The queue will eventually recover. Queue going to a single consumer on restart: (easy to recreate) Create a queue with multiple consumers. Stick a lot of messages on the queue, with consumers slowly pulling messages. Restart ActiveMQ while messages are still on the queue. All existing messages on the queue go to a single consumer. New messages pushed to the queue are distributed to the other consumers as expected.
        Hide
        Gary Tully added a comment -

        If you can produce a test case, there is some chance that we can investigate, without that it is impossible to know what is going on.

        Show
        Gary Tully added a comment - If you can produce a test case, there is some chance that we can investigate, without that it is impossible to know what is going on.
        Hide
        Erkan added a comment -

        I don't have any idea how we can resolve this problem.

        Show
        Erkan added a comment - I don't have any idea how we can resolve this problem.

          People

          • Assignee:
            Unassigned
            Reporter:
            Erkan
          • Votes:
            14 Vote for this issue
            Watchers:
            17 Start watching this issue

            Dates

            • Created:
              Updated:

              Development