ActiveMQ
  1. ActiveMQ
  2. AMQ-2584

Massege store is not cleaned when durable topic subscribers are refusing messages

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.3.0, 5.3.1, 5.4.0
    • Fix Version/s: 5.4.2
    • Component/s: Message Store
    • Labels:
      None
    • Environment:

      WinXP,
      java version "1.6.0_05"
      Java(TM) SE Runtime Environment (build 1.6.0_05-b13)
      Java HotSpot(TM) Client VM (build 10.0-b19, mixed mode, sharing)

      Description

      Hi,

      i am using activemq 5.3 (resp. 5.4 snapshot , 5.3.1 snapshot) with kahadb in following use-case:

      • 3 durable topic subscriber, each refuses message using session.recover(), 1 delivery attempts
      • ActiveMQ.DLQ consumer
      • persistent message topic producer

      In such case deadletter consumer should consume every message sent, as soon as number of delivery attempts is reached and mmessage is sent to ActiveMQ.DLQ. Result is ok but kahadb data directory at the end contains all log files with names db-<number>.log ever created. They aren't deleted even after some time.

      I can also see following massege in console:

      WARN | Duplicate message add attempt rejected. Message id: ID:sk1d069c-3826-1264006781626
      -0:0:1:1:13425

      If use-case is altered to use queue instead of topic log files are periodically deleted without WARN messages in console.

      Same behaviour (data files not cleaned) if amqPersistenceAdapter is used except of WARN messages.

      1. 2584_test.zip
        10 kB
        Juraj Kuruc
      2. AMQ2584Test.java
        8 kB
        Stan Lewis
      3. AMQ2584Test.java
        8 kB
        Gary Tully
      4. UpdatedTestCase.patch
        9 kB
        Stan Lewis

        Activity

        Hide
        Gary Tully added a comment -

        on a dead letter strategy, eg:

        <policyEntry ..">
                    <deadLetterStrategy>
                      <individualDeadLetterStrategy queuePrefix="Test.DLQ." processNonPersistent="true" enableAudit="false" />
                    </deadLetterStrategy>
        Show
        Gary Tully added a comment - on a dead letter strategy, eg: <policyEntry .."> <deadLetterStrategy> <individualDeadLetterStrategy queuePrefix= "Test.DLQ." processNonPersistent= " true " enableAudit= " false " /> </deadLetterStrategy>
        Hide
        Carey Flichel added a comment -

        >> new DeadLetterStrategy enableAudit attribute introduced in http://svn.apache.org/viewvc?rev=1145092&view=rev

        Gary, will this be configurable on the individualDeadLetterStrategy/sharedDeadLetterStrategy element or will it be derived from a policyEntry setting?

        Show
        Carey Flichel added a comment - >> new DeadLetterStrategy enableAudit attribute introduced in http://svn.apache.org/viewvc?rev=1145092&view=rev Gary, will this be configurable on the individualDeadLetterStrategy/sharedDeadLetterStrategy element or will it be derived from a policyEntry setting?
        Hide
        Lars Hvile added a comment - - edited

        I moved the messages using the admin interface (conf/jetty.xml).

        >> Any chance you have a test case?
        This happens almost every time a message is moved back to the DLQ the 2nd time. Seems a little bit random though..

        Using the new setting in #1145092 we can avoid message loss now, but shouldn't that be the default? I mean, if people are bothered by dups in the DLQ wouldn't it be better for them to enable the audit?

        Btw, when will this fix be released?

        Show
        Lars Hvile added a comment - - edited I moved the messages using the admin interface (conf/jetty.xml). >> Any chance you have a test case? This happens almost every time a message is moved back to the DLQ the 2nd time. Seems a little bit random though.. Using the new setting in #1145092 we can avoid message loss now, but shouldn't that be the default? I mean, if people are bothered by dups in the DLQ wouldn't it be better for them to enable the audit? Btw, when will this fix be released?
        Hide
        Gary Tully added a comment -

        new DeadLetterStrategy enableAudit attribute introduced in http://svn.apache.org/viewvc?rev=1145092&view=rev

        Show
        Gary Tully added a comment - new DeadLetterStrategy enableAudit attribute introduced in http://svn.apache.org/viewvc?rev=1145092&view=rev
        Hide
        Gary Tully added a comment - - edited

        Lars, yes, the audit should be configurable, such that is it possible to disable. I will expose the option on the default DLQ strategy.

        btw: How do you achieve 2), is it a manual process or do you use a camel route or something else? Any chance you have a test case?

        The addition of https://issues.apache.org/jira/browse/AMQ-3003 provides an alternative way of avoiding duplicate submission to the DLQ in the scenario described in this issue.

        Show
        Gary Tully added a comment - - edited Lars, yes, the audit should be configurable, such that is it possible to disable. I will expose the option on the default DLQ strategy. btw: How do you achieve 2), is it a manual process or do you use a camel route or something else? Any chance you have a test case? The addition of https://issues.apache.org/jira/browse/AMQ-3003 provides an alternative way of avoiding duplicate submission to the DLQ in the scenario described in this issue.
        Hide
        Lars Hvile added a comment -

        I'm having a problem where messages fails to enter the DLQ. My scenario is:

        1. some message fails during processing and ends up on the DLQ
        2. it's moved back to its original queue, but fails again
        3. message not sent back to the DLQ

        After some googling I found this issue, and I suspect you may have introduced a bug in AbstractDeadLetterStrategy by adding the duplication-checks. (http://svn.apache.org/viewvc/activemq/trunk/activemq-core/src/main/java/org/apache/activemq/broker/region/policy/AbstractDeadLetterStrategy.java?r1=1030013&r2=1030012&pathrev=1030013).

        I'm seeing the debug log-entry when messages are lost..
        2011-07-09 14:59:07,177 | DEBUG | rollback: TX:ID:ubuntu-48335-1310214645889-0:1:18 syncCount: 2 | org.apache.activemq.transaction.LocalTransaction | ActiveMQ Transport: tcp:///127.0.0.1:59188
        2011-07-09 14:59:07,178 | DEBUG | Not adding duplicate to DLQ: ID:ubuntu-58129-1310214649907-0:1:9:1:5, dest: queue://xxxx | org.apache.activemq.broker.region.policy.AbstractDeadLetterStrategy | ActiveMQ Transport: tcp:///127.0.0.1:59188

        Show
        Lars Hvile added a comment - I'm having a problem where messages fails to enter the DLQ. My scenario is: 1. some message fails during processing and ends up on the DLQ 2. it's moved back to its original queue, but fails again 3. message not sent back to the DLQ After some googling I found this issue, and I suspect you may have introduced a bug in AbstractDeadLetterStrategy by adding the duplication-checks. ( http://svn.apache.org/viewvc/activemq/trunk/activemq-core/src/main/java/org/apache/activemq/broker/region/policy/AbstractDeadLetterStrategy.java?r1=1030013&r2=1030012&pathrev=1030013 ). I'm seeing the debug log-entry when messages are lost.. 2011-07-09 14:59:07,177 | DEBUG | rollback: TX:ID:ubuntu-48335-1310214645889-0:1:18 syncCount: 2 | org.apache.activemq.transaction.LocalTransaction | ActiveMQ Transport: tcp:///127.0.0.1:59188 2011-07-09 14:59:07,178 | DEBUG | Not adding duplicate to DLQ: ID:ubuntu-58129-1310214649907-0:1:9:1:5, dest: queue://xxxx | org.apache.activemq.broker.region.policy.AbstractDeadLetterStrategy | ActiveMQ Transport: tcp:///127.0.0.1:59188
        Hide
        Gary Tully added a comment -

        Final resolution is to suppress duplicate dispatch to the DLQ at source, in the DeadLetterStrategy implementation by having it keep an audit. In this way, only one copy of a message is dispatched to the DLQ, irrespective of how many durable consumers reject the message. AMQ-3003 will help with that.
        Most duplicates are trapped by the store, but when the store has already acked the original, it can get through, the potential problem is the pending cursor of messages to dispatch. When concurrent with store, this can result in a missed dispatch and a subsequent missing ack leaving a dangling message.
        Suppressing the duplicate send in the deadletter strategy removes this possibility.
        Fix in r1030013

        Show
        Gary Tully added a comment - Final resolution is to suppress duplicate dispatch to the DLQ at source, in the DeadLetterStrategy implementation by having it keep an audit. In this way, only one copy of a message is dispatched to the DLQ, irrespective of how many durable consumers reject the message. AMQ-3003 will help with that. Most duplicates are trapped by the store, but when the store has already acked the original, it can get through, the potential problem is the pending cursor of messages to dispatch. When concurrent with store, this can result in a missed dispatch and a subsequent missing ack leaving a dangling message. Suppressing the duplicate send in the deadletter strategy removes this possibility. Fix in r1030013
        Hide
        Gary Tully added a comment -

        additionally,

        concurrentStoreAndDispatchQueues=false

        must be added to the kahaDB persistence adapter to allow it to correctly ack the duplicates that ensue. Otherwise there can be the occasional missed ack. This needs more investigation to see if it can be made work reliably with the default value.

        Show
        Gary Tully added a comment - additionally, concurrentStoreAndDispatchQueues= false must be added to the kahaDB persistence adapter to allow it to correctly ack the duplicates that ensue. Otherwise there can be the occasional missed ack. This needs more investigation to see if it can be made work reliably with the default value.
        Hide
        Gary Tully added a comment -

        A workaround for this issue, where the DLQ can receive duplicates, is to disable the cursor message audit for the DLQ.

        <policyEntry queue=ActiveMQ.DLQ" enableAudit="false" />

        An enhancement, to allow a DLQ per durable subscription would make it possible to track which durable subscription rejected a message on an individual basis:
        https://issues.apache.org/activemq/browse/AMQ-3003

        Show
        Gary Tully added a comment - A workaround for this issue, where the DLQ can receive duplicates, is to disable the cursor message audit for the DLQ. <policyEntry queue=ActiveMQ.DLQ " enableAudit=" false " /> An enhancement, to allow a DLQ per durable subscription would make it possible to track which durable subscription rejected a message on an individual basis: https://issues.apache.org/activemq/browse/AMQ-3003
        Hide
        Stan Lewis added a comment -

        Looks like there's still a case this issue can occur, namely if the consumers and DLQ consumer consumes messages concurrently. Here's a patch that updates the existing test which will then show this other case.

        Show
        Stan Lewis added a comment - Looks like there's still a case this issue can occur, namely if the consumers and DLQ consumer consumes messages concurrently. Here's a patch that updates the existing test which will then show this other case.
        Hide
        Gary Tully added a comment -

        got to the bottom of this. Both stores left dangling references to data files when a journal add was suppressed as a duplicate by the index or reference store.
        In the AMQ case, the bug was a little subtler as it only occurred if a batched write to the journal was pending when the duplicate occurs.

        Show
        Gary Tully added a comment - got to the bottom of this. Both stores left dangling references to data files when a journal add was suppressed as a duplicate by the index or reference store. In the AMQ case, the bug was a little subtler as it only occurred if a batched write to the journal was pending when the duplicate occurs.
        Hide
        Stan Lewis added a comment -

        Yup, sure thing Gary. Haven't had time to look at it other than updating the test case, but I think it's still an issue.

        Show
        Stan Lewis added a comment - Yup, sure thing Gary. Haven't had time to look at it other than updating the test case, but I think it's still an issue.
        Hide
        Gary Tully added a comment -

        need to look at it in some detail.. guess you should reopen this pending that.

        Show
        Gary Tully added a comment - need to look at it in some detail.. guess you should reopen this pending that.
        Hide
        Stan Lewis added a comment -

        Hey Gary, I modified the test case to use multiple consumers, I think that's when this seems to break, what do you think?

        Show
        Stan Lewis added a comment - Hey Gary, I modified the test case to use multiple consumers, I think that's when this seems to break, what do you think?
        Hide
        Gary Tully added a comment -

        Try the new test case, if you can make that fail, please reopen

        Show
        Gary Tully added a comment - Try the new test case, if you can make that fail, please reopen
        Hide
        Gary Tully added a comment -

        The difference may be the use of a small journal data file length such that the files are quickly reclaimed when the contain unreferenced data. With a large file all the messages can exist in one or two data files.

        Show
        Gary Tully added a comment - The difference may be the use of a small journal data file length such that the files are quickly reclaimed when the contain unreferenced data. With a large file all the messages can exist in one or two data files.
        Hide
        Gary Tully added a comment -

        junit test case based on submitted test but borrowing from the junit test case attached to https://issues.apache.org/activemq/browse/AMQ-2870

        This test case works fine on trunk

        Show
        Gary Tully added a comment - junit test case based on submitted test but borrowing from the junit test case attached to https://issues.apache.org/activemq/browse/AMQ-2870 This test case works fine on trunk
        Hide
        Juraj Kuruc added a comment -

        Find attached java files, sample message and configuration file activemq.xml for reproducing this issue.

        Show
        Juraj Kuruc added a comment - Find attached java files, sample message and configuration file activemq.xml for reproducing this issue.
        Hide
        Gary Tully added a comment -

        Could you attach a test case to this?

        I have just tracked down the cause of the:
        WARN | Duplicate message add attempt rejected. Message id: ID:sk1d069c-3826-1264006781626-0:0:1:1:13425
        but I don't think this could result in dangling references to files in the store.
        So i suspect the there is another reason for the data files not being cleaned up. At simple test case would help a lot.

        Show
        Gary Tully added a comment - Could you attach a test case to this? I have just tracked down the cause of the: WARN | Duplicate message add attempt rejected. Message id: ID:sk1d069c-3826-1264006781626-0:0:1:1:13425 but I don't think this could result in dangling references to files in the store. So i suspect the there is another reason for the data files not being cleaned up. At simple test case would help a lot.

          People

          • Assignee:
            Gary Tully
            Reporter:
            Juraj Kuruc
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development