Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 5.3.2
    • Fix Version/s: 5.5.0
    • Component/s: None
    • Labels:
      None

      Description

      Over time, we're seeing that kahadb doesn't clean up old journal files. As a result, we eventually run out of disk space, or rather, we hit our usage limits and our producers are slowed down by the producer flow control mechanism. Others have experienced this problem too (for example, see http://mail-archives.apache.org/mod_mbox/activemq-users/201002.mbox/%3C27502591.post@talk.nabble.com%3E)

      For now, we're moving back to the old amqPersistenceStore.

      1. MyKahaDBStore.java
        1 kB
        Eduardo Costa
      2. amq-2987-testcase.patch
        8 kB
        Jeff Genender
      3. amq-2987.patch
        2 kB
        Jeff Genender
      4. AMQ2736Test_should_with_this_diff.txt
        9 kB
        Gary Tully
      5. AMQ-2736.zip
        3 kB
        Eduardo Costa
      There are no Sub-Tasks for this issue.

        Activity

        Hide
        Vasile M added a comment - - edited

        I just downloaded the version 5.6.0 of ActiveMQ.
        I took the old configuration I was using for 5.5.1 and repeat the tests with my application.
        It seems the problems are fixed!
        It means the fixed is not made in version 5.5.0 (like the bug is describing).
        So please modify the description(Fix Versions field) so the people should know that the version 5.6.0 it contains the fix.
        I won't open a new bug because it makes no sense.

        Show
        Vasile M added a comment - - edited I just downloaded the version 5.6.0 of ActiveMQ. I took the old configuration I was using for 5.5.1 and repeat the tests with my application. It seems the problems are fixed! It means the fixed is not made in version 5.5.0 (like the bug is describing). So please modify the description(Fix Versions field) so the people should know that the version 5.6.0 it contains the fix. I won't open a new bug because it makes no sense.
        Hide
        Vasile M added a comment - - edited

        I want to add to my previous comment that we are using the CPP library CMS for ActiveMQ with stomp too.
        This is the second use-case.
        We have one producer and one subscriber and this is already causing problems.
        The logs are growing.
        In C++ we use Session::AUTO_ACKNOWLEDGE and even in this case the log files are not deleted.
        We write(using a producer) in the queue and and another process is reading(using a listener) we never have many messages in the queue.
        QueueName Number of pending messages Number of listener Messages Enqueued Messages Dequeued
        queuename 0 1 21522 21522
        This is maybe a new use-case too ?

        Show
        Vasile M added a comment - - edited I want to add to my previous comment that we are using the CPP library CMS for ActiveMQ with stomp too. This is the second use-case. We have one producer and one subscriber and this is already causing problems. The logs are growing. In C++ we use Session::AUTO_ACKNOWLEDGE and even in this case the log files are not deleted. We write(using a producer) in the queue and and another process is reading(using a listener) we never have many messages in the queue. QueueName Number of pending messages Number of listener Messages Enqueued Messages Dequeued queuename 0 1 21522 21522 This is maybe a new use-case too ?
        Hide
        Gary Tully added a comment -

        @Vasile
        If you can reproduce at will, and can provide a tests case, please open a new jira issue to track your use case. Also, validate that your use case fails on trunk using the latest snapshot.

        Show
        Gary Tully added a comment - @Vasile If you can reproduce at will, and can provide a tests case, please open a new jira issue to track your use case. Also, validate that your use case fails on trunk using the latest snapshot.
        Hide
        Vasile M added a comment -

        Hi guys,

        I am using the version 5.5.1 of ActiveMQ and I can still reproduce this problem.
        I saw you have fixed some problems but I think my case is different.
        The most connected bug with my issue is :https://issues.apache.org/jira/browse/AMQ-2983

        In my case I am no using local transactions, I am using global transaction (XATransactions) and then I always comit the transactions and still the log files are not deleted.
        The JMS Connections Pool in my case it is defined in glassfish using the ActiveMQ Resource Adapter.
        Have you tested with global transactions too ?

        Show
        Vasile M added a comment - Hi guys, I am using the version 5.5.1 of ActiveMQ and I can still reproduce this problem. I saw you have fixed some problems but I think my case is different. The most connected bug with my issue is : https://issues.apache.org/jira/browse/AMQ-2983 In my case I am no using local transactions, I am using global transaction (XATransactions) and then I always comit the transactions and still the log files are not deleted. The JMS Connections Pool in my case it is defined in glassfish using the ActiveMQ Resource Adapter. Have you tested with global transactions too ?
        Hide
        Gary Tully added a comment -

        Auto consolidation would be a nice enhancement. Can you open a new issue to track that.

        As you know, you can also reduce the journal data file size to reduce the overall disk usage in this case.

        One other thought, using a camel route, you can easily consume and re-submit messages so that they are moved to the current journal data file. Have a look at the route from the dlq back the the original destination for inspiration. https://issues.apache.org/jira/browse/AMQ-2710. The idea would be to periodically fire a consumer, using quarts, that removes and adds messages. See the test that pulls the original destination from the message.

        http://svn.apache.org/viewvc/activemq/trunk/activemq-camel/src/test/resources/org/apache/activemq/camel/CamelRedeliveryTest-context.xml?view=markup
        http://svn.apache.org/viewvc/activemq/trunk/activemq-camel/src/test/java/org/apache/activemq/camel/CamelRedeliveryTest.java?view=markup

        Show
        Gary Tully added a comment - Auto consolidation would be a nice enhancement. Can you open a new issue to track that. As you know, you can also reduce the journal data file size to reduce the overall disk usage in this case. One other thought, using a camel route, you can easily consume and re-submit messages so that they are moved to the current journal data file. Have a look at the route from the dlq back the the original destination for inspiration. https://issues.apache.org/jira/browse/AMQ-2710 . The idea would be to periodically fire a consumer, using quarts, that removes and adds messages. See the test that pulls the original destination from the message. http://svn.apache.org/viewvc/activemq/trunk/activemq-camel/src/test/resources/org/apache/activemq/camel/CamelRedeliveryTest-context.xml?view=markup http://svn.apache.org/viewvc/activemq/trunk/activemq-camel/src/test/java/org/apache/activemq/camel/CamelRedeliveryTest.java?view=markup
        Hide
        Stephanie Miotke added a comment -

        This does not appear to be fixed in 5.5.

        I have a queue that holds onto a message every once in a while due to bad formatting or what-have-you (not something I can immediately correct). Since KahaDB requires the need to retain the full contents of the DB file while one message persists, it will cause the file system to grow dramatically if you alter the default kahaDB file size at all. With a DB file of 96 megs each I've eaten up almost 5 gigs of disk space in 34 pending messages all less than 2000 bytes each. I'm going to begin auto purging the offending queue (since we don't require those messages), however it would be preferred to have some automatic process possibly automatically consolidate the DB once its reached a pre-configured limit?

        Show
        Stephanie Miotke added a comment - This does not appear to be fixed in 5.5. I have a queue that holds onto a message every once in a while due to bad formatting or what-have-you (not something I can immediately correct). Since KahaDB requires the need to retain the full contents of the DB file while one message persists, it will cause the file system to grow dramatically if you alter the default kahaDB file size at all. With a DB file of 96 megs each I've eaten up almost 5 gigs of disk space in 34 pending messages all less than 2000 bytes each. I'm going to begin auto purging the offending queue (since we don't require those messages), however it would be preferred to have some automatic process possibly automatically consolidate the DB once its reached a pre-configured limit?
        Hide
        Martin Carpella added a comment - - edited

        I am seeing this or a similar issue on a 5.5.0 broker as well, log files are not being deleted. I now deleted all queues, but I am still seeing the checkpoint task "DEBUG | queue://a.deleted.queue expiring messages .." in the logs.

        I stopped ActiveMQ afterwards and restartet it, after I deleted all queues and still the space being occupied.

        During the restart, even after I deleted all queues, I had:

        DEBUG | XA Transaction new/begin : XID:1096044365:3137322e31372e312e352e746d31353531373030333432:3137322e31372e312e352e746d37313330363334
        DEBUG | Ack for non existent subscription in recovery, ack:MessageAck {commandId = 66898301, responseRequired = false, ackType = 2, consumerId = ID:CLIENT-001-40008-1306310098527-2:5:10553821:1, firstMessageId = e041a486-8c88-3855-304e-4c9eac0f85b1:0:61129:61129, lastMessageId = e041a486-8c88-3855-304e-4c9eac0f85b1:0:61129:61129, destination = queue://a.deleted.queue, transactionId = XID:1096044365:3137322e31372e312e352e746d31353531373030333432:3137322e31372e312e352e746d37313330363334, messageCount = 1, poisonCause = null}
        

        After the restart, all the space was still occupied and log-files were never deleted. But at least no more expiry entries for the now deleted queues any more.

        KahaDB is 3 according to startup logs. Unfortunately I cannot provide steps to reproduce for 5.5.0 at the moment and the store in question is several GB in size and contains sensitive information I cannot share.

        Should I re-open this issue or a file this as a new issue?

        Show
        Martin Carpella added a comment - - edited I am seeing this or a similar issue on a 5.5.0 broker as well, log files are not being deleted. I now deleted all queues, but I am still seeing the checkpoint task "DEBUG | queue://a.deleted.queue expiring messages .." in the logs. I stopped ActiveMQ afterwards and restartet it, after I deleted all queues and still the space being occupied. During the restart, even after I deleted all queues, I had: DEBUG | XA Transaction new /begin : XID:1096044365:3137322e31372e312e352e746d31353531373030333432:3137322e31372e312e352e746d37313330363334 DEBUG | Ack for non existent subscription in recovery, ack:MessageAck {commandId = 66898301, responseRequired = false , ackType = 2, consumerId = ID:CLIENT-001-40008-1306310098527-2:5:10553821:1, firstMessageId = e041a486-8c88-3855-304e-4c9eac0f85b1:0:61129:61129, lastMessageId = e041a486-8c88-3855-304e-4c9eac0f85b1:0:61129:61129, destination = queue: //a.deleted.queue, transactionId = XID:1096044365:3137322e31372e312e352e746d31353531373030333432:3137322e31372e312e352e746d37313330363334, messageCount = 1, poisonCause = null } After the restart, all the space was still occupied and log-files were never deleted. But at least no more expiry entries for the now deleted queues any more. KahaDB is 3 according to startup logs. Unfortunately I cannot provide steps to reproduce for 5.5.0 at the moment and the store in question is several GB in size and contains sensitive information I cannot share. Should I re-open this issue or a file this as a new issue?
        Hide
        Gary Tully added a comment -

        Fixed up abortive shutdown case such that recovered local transactions are auto rolledback.

        http://svn.apache.org/viewvc?rev=1071732&view=rev

        I think the compact journal data files feature needs another issue.

        Show
        Gary Tully added a comment - Fixed up abortive shutdown case such that recovered local transactions are auto rolledback. http://svn.apache.org/viewvc?rev=1071732&view=rev I think the compact journal data files feature needs another issue.
        Hide
        Gary Tully added a comment -

        @Martin, thanks for the feedback. The second issue is indeed a problem, on normal shutdown, inflight (non xa) transactions are rolledback. The same (rollback) should happen after recovery, to deal with the abortive shutdown case.

        Show
        Gary Tully added a comment - @Martin, thanks for the feedback. The second issue is indeed a problem, on normal shutdown, inflight (non xa) transactions are rolledback. The same (rollback) should happen after recovery, to deal with the abortive shutdown case.
        Hide
        Martin Cekal added a comment -

        Hello Dejan,
        We noticed StorageFull on KahaDb several times so I played a little and found two reasons for this (second seems to be a AMQ bug):
        1) unconsumed message(s) in any datafile blocks the datafile forever there. So if you keep default setup 33mb per file and 1gb limit roughly 30 small messages spread one to each datafile can block whole system
        (I assume this is not a bug, just not nice feature - Why not to reply(=rewrite) messages at least during AMQ restart to latest datafile?)

        2) second issue seems to me as a bug. If you open transacted session and send a message, but do not commit it a datafile is blocked. Again if you spread such uncommited messages over all datafiles you gets "storage is full". That is ok - someone forget to commit - but problem comes if you kill AMQ using -9 signal (or via TaskManager on Windows). If you do this non-standard kill (=like a hw failure) the uncommited transaction seems to be blocking datafiles forever. New start of AMQ somehow reads again the transaction and still waiting for commit although the producer is not live any more. If you do regular shutdown so ShutdownHook on the Broker is fully executed, next AMQ start correctly recognize whether producer of uncommited transaction (with failover transport) is still up (then transaction is kept, messages are replayed to new/last datafile) or down (then transaction is rollbacked).
        I think AMQ should be able to behave this way even if previously someone killed it brutally(kill -9).

        I have tested these two scenarios on both 5.3.2 and 5.4.2 with same result. We would like to get a patch for 5.3.2 version. Is it possible?

        Thanks a lot
        Martin

        Show
        Martin Cekal added a comment - Hello Dejan, We noticed StorageFull on KahaDb several times so I played a little and found two reasons for this (second seems to be a AMQ bug): 1) unconsumed message(s) in any datafile blocks the datafile forever there. So if you keep default setup 33mb per file and 1gb limit roughly 30 small messages spread one to each datafile can block whole system (I assume this is not a bug, just not nice feature - Why not to reply(=rewrite) messages at least during AMQ restart to latest datafile?) 2) second issue seems to me as a bug. If you open transacted session and send a message, but do not commit it a datafile is blocked. Again if you spread such uncommited messages over all datafiles you gets "storage is full". That is ok - someone forget to commit - but problem comes if you kill AMQ using -9 signal (or via TaskManager on Windows). If you do this non-standard kill (=like a hw failure) the uncommited transaction seems to be blocking datafiles forever. New start of AMQ somehow reads again the transaction and still waiting for commit although the producer is not live any more. If you do regular shutdown so ShutdownHook on the Broker is fully executed, next AMQ start correctly recognize whether producer of uncommited transaction (with failover transport) is still up (then transaction is kept, messages are replayed to new/last datafile) or down (then transaction is rollbacked). I think AMQ should be able to behave this way even if previously someone killed it brutally(kill -9). I have tested these two scenarios on both 5.3.2 and 5.4.2 with same result. We would like to get a patch for 5.3.2 version. Is it possible? Thanks a lot Martin
        Hide
        Dejan Bosanac added a comment -

        Can anyone create a test case, or provide a steps to reproduce the problem with 5.4.2 release? Thanks!

        Show
        Dejan Bosanac added a comment - Can anyone create a test case, or provide a steps to reproduce the problem with 5.4.2 release? Thanks!
        Hide
        James Green added a comment -

        Hmm something's not right on our 5.4.2 instance either. A 5GB store reporting itself (after purging some queues) ~89% full. On disk 4.4G. Restarted AMQ it went down to 1.6G and 29% full.

        Show
        James Green added a comment - Hmm something's not right on our 5.4.2 instance either. A 5GB store reporting itself (after purging some queues) ~89% full. On disk 4.4G. Restarted AMQ it went down to 1.6G and 29% full.
        Hide
        Adrian Lisenberg added a comment -

        I think the same. I installed 5.4.2 but still growing. I will rollback to 5.4.1 with patches

        Thanks

        Show
        Adrian Lisenberg added a comment - I think the same. I installed 5.4.2 but still growing. I will rollback to 5.4.1 with patches Thanks
        Hide
        Matthew Von-Maszewski added a comment -

        No. Not fixed in 5.4.2 ... just found that out today. The hard way.

        Show
        Matthew Von-Maszewski added a comment - No. Not fixed in 5.4.2 ... just found that out today. The hard way.
        Hide
        Straun added a comment -

        5.4.2 does include AMQ-2982 and AMQ-2983, whether this solves all of the issue I am unsure but using 5.4.2 worked for me.

        Show
        Straun added a comment - 5.4.2 does include AMQ-2982 and AMQ-2983 , whether this solves all of the issue I am unsure but using 5.4.2 worked for me.
        Hide
        Adrian Lisenberg added a comment -

        Hi, is this bug fixed on version 5.4.2? or I need to apply the patch too?

        thanks!

        Show
        Adrian Lisenberg added a comment - Hi, is this bug fixed on version 5.4.2? or I need to apply the patch too? thanks!
        Hide
        Jeff Genender added a comment -

        Steve, I am having difficulty reproducing this... can you please attach a test case?

        Show
        Jeff Genender added a comment - Steve, I am having difficulty reproducing this... can you please attach a test case?
        Hide
        Steve Pietrowicz added a comment -

        I've been trying to track down a solution to a problem similar to this, and the latest snapshot of 5.5 doesn't resolve this for me.

        This is the scenario I've created:

        1) a producer starts sending messages
        2) a consumer starts consuming messages, but pauses after 100.
        3) db-##.log files start showing up in the tmp_storage directory as expected. I let the producer go for a while so a number of db-##.log files are created.
        4) the consumer starts consuming messages again. The db-##.log files start disappearing as they're used up.
        5) everything finishes.

        Now, if I do:

        1) a producer starts sending messages
        2) a consumer starts consuming messages, but pauses after 100.
        3) db-##.log files start showing up in the tmp_storage directory as expected. I let the producer go for a while so that a number of db-##.log files are created.

        and kill the consumer and the producer. The db-##.log files remain, and will not get cleaned up, even on subsequent invocations of new consumers and producers. If a new consumer/producer (as in the first scenario) are created, it all works; it's just that those old .log files never get cleaned up.

        Show
        Steve Pietrowicz added a comment - I've been trying to track down a solution to a problem similar to this, and the latest snapshot of 5.5 doesn't resolve this for me. This is the scenario I've created: 1) a producer starts sending messages 2) a consumer starts consuming messages, but pauses after 100. 3) db-##.log files start showing up in the tmp_storage directory as expected. I let the producer go for a while so a number of db-##.log files are created. 4) the consumer starts consuming messages again. The db-##.log files start disappearing as they're used up. 5) everything finishes. Now, if I do: 1) a producer starts sending messages 2) a consumer starts consuming messages, but pauses after 100. 3) db-##.log files start showing up in the tmp_storage directory as expected. I let the producer go for a while so that a number of db-##.log files are created. and kill the consumer and the producer. The db-##.log files remain, and will not get cleaned up, even on subsequent invocations of new consumers and producers. If a new consumer/producer (as in the first scenario) are created, it all works; it's just that those old .log files never get cleaned up.
        Hide
        Christer WIkman added a comment -

        Use Jeff's patch for 5.4.1, http://people.apache.org/~jgenender/apache-activemq-5.4.1-AMQ2736-AMQ2982-AMQ2983-AMQ2935-bin.tar.gz

        I deployed it in production today and it totally removed the problem with growing log files. It went down from 78 log files to just 1 log file.

        Show
        Christer WIkman added a comment - Use Jeff's patch for 5.4.1, http://people.apache.org/~jgenender/apache-activemq-5.4.1-AMQ2736-AMQ2982-AMQ2983-AMQ2935-bin.tar.gz I deployed it in production today and it totally removed the problem with growing log files. It went down from 78 log files to just 1 log file.
        Hide
        Adrian Lisenberg added a comment -

        Hi, I really need this fix!!!!. So finally you tested the fix on 5.4.1 or you are going to put 5.5 in production?

        excuse me, but which is the file with the correct patch?

        One more question: After applying this fix, would be any way for deleting old log files in order to reduce unused space? I am at 100G and I need to maintain the current state of the broker...

        thanks.

        Show
        Adrian Lisenberg added a comment - Hi, I really need this fix!!!!. So finally you tested the fix on 5.4.1 or you are going to put 5.5 in production? excuse me, but which is the file with the correct patch? One more question: After applying this fix, would be any way for deleting old log files in order to reduce unused space? I am at 100G and I need to maintain the current state of the broker... thanks.
        Hide
        Dejan Bosanac added a comment -

        BTW. We released a version of Fuse Message Broker that contains a fix for this issue. If anyone is interested, it can be found here

        http://repo.fusesource.com/nexus/content/repositories/releases/org/apache/activemq/apache-activemq/5.4.1-fuse-01-00/

        Release notes available at: http://fusesource.com/wiki/display/ProdInfo/FUSE+Message+Broker+v5.4.1-fuse+Release+Notes

        Show
        Dejan Bosanac added a comment - BTW. We released a version of Fuse Message Broker that contains a fix for this issue. If anyone is interested, it can be found here http://repo.fusesource.com/nexus/content/repositories/releases/org/apache/activemq/apache-activemq/5.4.1-fuse-01-00/ Release notes available at: http://fusesource.com/wiki/display/ProdInfo/FUSE+Message+Broker+v5.4.1-fuse+Release+Notes
        Hide
        Christer WIkman added a comment -

        I have been running Jeff's fix in stage for 5 days and the log files have not grown. I'm now comfortable to use it in production.

        Many thanks for all help!

        Show
        Christer WIkman added a comment - I have been running Jeff's fix in stage for 5 days and the log files have not grown. I'm now comfortable to use it in production. Many thanks for all help!
        Hide
        Christer WIkman added a comment -

        @Jeff - Thanks! I have deployed your fix in our stage environment. It works much better but not 100%. All queues are empty but I still have 4 log files and one contains 0 bytes, last updated yesterday, and other log file was last updated 6 hours ago. When I used 5.5-SNAPSHOT it went down to 1 log file!

        I will evaluate it for some more days before I can tell for sure if the log files grows.

        Show
        Christer WIkman added a comment - @Jeff - Thanks! I have deployed your fix in our stage environment. It works much better but not 100%. All queues are empty but I still have 4 log files and one contains 0 bytes, last updated yesterday, and other log file was last updated 6 hours ago. When I used 5.5-SNAPSHOT it went down to 1 log file! I will evaluate it for some more days before I can tell for sure if the log files grows.
        Hide
        Jeff Genender added a comment -

        @Christer Wlkman - I built one with the following JIRA patches:

        http://people.apache.org/~jgenender/apache-activemq-5.4.1-AMQ2736-AMQ2982-AMQ2983-AMQ2935-bin.tar.gz

        YMMV and this is not an official release. Post back if this works for you.

        Show
        Jeff Genender added a comment - @Christer Wlkman - I built one with the following JIRA patches: http://people.apache.org/~jgenender/apache-activemq-5.4.1-AMQ2736-AMQ2982-AMQ2983-AMQ2935-bin.tar.gz YMMV and this is not an official release. Post back if this works for you.
        Hide
        Christer WIkman added a comment -

        I deployed yesterday in stage latest 5.5-SNAPSHOT and it seems that the problem is solved. This morning I had 17 log files, but after I deleted DLQ queues with messages I ended up with only 1 log file! When I used 5.4.1 the files grow with approx 50-100 files per day.

        We are not comfortable deploying 5.5-SNAPSHOT in production. Can anyone provide me with a patch for 5.4.1? Currently, I cannot recommend using version 5.4.1 in production without the combined fixes from 5.5-SNAPSHOT.

        Show
        Christer WIkman added a comment - I deployed yesterday in stage latest 5.5-SNAPSHOT and it seems that the problem is solved. This morning I had 17 log files, but after I deleted DLQ queues with messages I ended up with only 1 log file! When I used 5.4.1 the files grow with approx 50-100 files per day. We are not comfortable deploying 5.5-SNAPSHOT in production. Can anyone provide me with a patch for 5.4.1? Currently, I cannot recommend using version 5.4.1 in production without the combined fixes from 5.5-SNAPSHOT.
        Hide
        Gary Tully added a comment -

        Christer, please validate with the 5.5-SNAPSHOT, that has the required combined fixes.
        https://repository.apache.org/content/repositories/snapshots/org/apache/activemq/apache-activemq/5.5-SNAPSHOT/

        Show
        Gary Tully added a comment - Christer, please validate with the 5.5-SNAPSHOT, that has the required combined fixes. https://repository.apache.org/content/repositories/snapshots/org/apache/activemq/apache-activemq/5.5-SNAPSHOT/
        Hide
        Christer WIkman added a comment -

        We running: http://people.apache.org/~jgenender/apache-activemq-5.4.1-AMQ2736-bin.tar.gz and we are still experience problems with log files that are not deleted.

        I deployed the fix yesterday with a new kahadb and this morning I had 155 log files with only 43 messages on two DLQ queues. I deleted the DLQ queues and only 12 log files where deleted. Currently,. I have 144 log files and nothing in the queues.

        Show
        Christer WIkman added a comment - We running: http://people.apache.org/~jgenender/apache-activemq-5.4.1-AMQ2736-bin.tar.gz and we are still experience problems with log files that are not deleted. I deployed the fix yesterday with a new kahadb and this morning I had 155 log files with only 43 messages on two DLQ queues. I deleted the DLQ queues and only 12 log files where deleted. Currently,. I have 144 log files and nothing in the queues.
        Hide
        Dejan Bosanac added a comment -

        Just to wrap it all up, there were three issues spotted;

        • one spotted by Jeff and fixed by supplied patch and
        • and two covered by AMQ-2982 and AMQ-2983

        Thanks to all who contributed, especially Jeff and Swen

        Show
        Dejan Bosanac added a comment - Just to wrap it all up, there were three issues spotted; one spotted by Jeff and fixed by supplied patch and and two covered by AMQ-2982 and AMQ-2983 Thanks to all who contributed, especially Jeff and Swen
        Hide
        Dejan Bosanac added a comment -

        That potential issue from the last comment was the false alarm. I believe that this issue should be fixed now. Please test the latest snapshot

        https://repository.apache.org/content/repositories/snapshots/org/apache/activemq/apache-activemq/5.5-SNAPSHOT/

        and reopen if you find any problems.

        Thanks

        Show
        Dejan Bosanac added a comment - That potential issue from the last comment was the false alarm. I believe that this issue should be fixed now. Please test the latest snapshot https://repository.apache.org/content/repositories/snapshots/org/apache/activemq/apache-activemq/5.5-SNAPSHOT/ and reopen if you find any problems. Thanks
        Hide
        Dejan Bosanac added a comment -

        AMQ-2982 and AMQ-2983 are fixed now on the trunk, so you can test out if that helps. I spotted one more potential issue in the area and I'm working on it now, so stay tuned.

        Show
        Dejan Bosanac added a comment - AMQ-2982 and AMQ-2983 are fixed now on the trunk, so you can test out if that helps. I spotted one more potential issue in the area and I'm working on it now, so stay tuned.
        Hide
        Giampaolo Tranchida added a comment -

        I think that the fix AMQ-2982 must fix also this issue.

        Show
        Giampaolo Tranchida added a comment - I think that the fix AMQ-2982 must fix also this issue.
        Hide
        Giampaolo Tranchida added a comment -

        Do not work for me, same issue.

        Show
        Giampaolo Tranchida added a comment - Do not work for me, same issue.
        Hide
        Jeff Genender added a comment -

        Here is the 5.4.1 distribution with the patch in it. This is not an official release... could people please test it out?

        http://people.apache.org/~jgenender/apache-activemq-5.4.1-AMQ2736-bin.tar.gz

        Show
        Jeff Genender added a comment - Here is the 5.4.1 distribution with the patch in it. This is not an official release... could people please test it out? http://people.apache.org/~jgenender/apache-activemq-5.4.1-AMQ2736-bin.tar.gz
        Hide
        Jeff Genender added a comment -

        @Christer WIkman - Gary's attachment wasn't a patch... it was a revert to the test case to validate the test case fails. Please try my patch.

        Show
        Jeff Genender added a comment - @Christer WIkman - Gary's attachment wasn't a patch... it was a revert to the test case to validate the test case fails. Please try my patch.
        Hide
        Jeff Genender added a comment -

        Folks..

        Please try the patch supplied above: https://issues.apache.org/activemq/secure/attachment/19715/amq-2987.patch

        That appears to fix the problem with the AMQ-2736.zip test.

        Show
        Jeff Genender added a comment - Folks.. Please try the patch supplied above: https://issues.apache.org/activemq/secure/attachment/19715/amq-2987.patch That appears to fix the problem with the AMQ-2736 .zip test.
        Hide
        Abhishek C added a comment -

        @Swen Does using non transacted sessions with auto acknowledge clear the log files?
        We have been facing a similar problem, however we are using Kaha Persistence Adapter on upgrading from 5.3.2 to 5.4.1. We had to upgrade to 5.4.1 because earlier using Durable Subs Topics our data files weren't getting cleaned up. Now our index-subs file are creating an issue.

        Any Final fix for this problem?

        Show
        Abhishek C added a comment - @Swen Does using non transacted sessions with auto acknowledge clear the log files? We have been facing a similar problem, however we are using Kaha Persistence Adapter on upgrading from 5.3.2 to 5.4.1. We had to upgrade to 5.4.1 because earlier using Durable Subs Topics our data files weren't getting cleaned up. Now our index-subs file are creating an issue. Any Final fix for this problem?
        Hide
        Swen Moczarski added a comment -

        We also had some trouble with growing log files (version 5.4.1). I have two issues identified with similar symptoms (but it seems to be another cause as to this issue):

        • AMQ-2982: sticky log files on a rollback of a local transaction (test case and patch available)
        • AMQ-2983: sticky log files on multiple concurrent consumers with local transacted sessions (test case available)

        If we only use non transacted sessions with auto-acknowledge, we have no problems.

        Show
        Swen Moczarski added a comment - We also had some trouble with growing log files (version 5.4.1). I have two issues identified with similar symptoms (but it seems to be another cause as to this issue): AMQ-2982 : sticky log files on a rollback of a local transaction (test case and patch available) AMQ-2983 : sticky log files on multiple concurrent consumers with local transacted sessions (test case available) If we only use non transacted sessions with auto-acknowledge, we have no problems.
        Hide
        Christer WIkman added a comment -

        I'm using ActiveMQ 5.4.1 in production and I have added Gary's patch, but the log files are stilling growing (even if all queues are empty) and finally Activemq crashes. If I restart ActiveMQ then the log files are deleted.

        Any suggestions on how to fix the problem?

        Show
        Christer WIkman added a comment - I'm using ActiveMQ 5.4.1 in production and I have added Gary's patch, but the log files are stilling growing (even if all queues are empty) and finally Activemq crashes. If I restart ActiveMQ then the log files are deleted. Any suggestions on how to fix the problem?
        Hide
        Eduardo Costa added a comment -

        I'm attaching a nasty workaround that commits the pending transactions and automatically archives (or deletes) the old log files when closing the store. I'm not proud of this thing, but it seems to prove the problem is on the pending transaction, since all useless files are correctly disposed after running it.

        Show
        Eduardo Costa added a comment - I'm attaching a nasty workaround that commits the pending transactions and automatically archives (or deletes) the old log files when closing the store. I'm not proud of this thing, but it seems to prove the problem is on the pending transaction, since all useless files are correctly disposed after running it.
        Hide
        Eduardo Costa added a comment -

        After some evil debugging tricks, I found that there are some pending inflight transactions (even after restarting the server) in my database files. In this scenario, the gcCandidadeSet is empty, since "metadata.firstInProgressTransactionLocation" points to the first log file (MessageDatabase.java, lines 1140-1154):

        // Don't GC files after the first in progress tx
        Location firstTxLocation = metadata.lastUpdate;
        if( metadata.firstInProgressTransactionLocation!=null ) {
          firstTxLocation = metadata.firstInProgressTransactionLocation;
        }
        
        if( firstTxLocation!=null ) {
          while( !gcCandidateSet.isEmpty() ) {
            Integer last = gcCandidateSet.last();
            if( last >= firstTxLocation.getDataFileId() ) {
              gcCandidateSet.remove(last);
            } else {
              break;
            }
          }
        }
        

        This leads to the question: why it must discard every log file starting from the one containing the first pending transaction? Any malfunctioning operation (even an one-time little mistake) will trash all log files from that point!

        I believe this should happens also on "kill -9" or unexpected power down.

        Show
        Eduardo Costa added a comment - After some evil debugging tricks, I found that there are some pending inflight transactions (even after restarting the server) in my database files. In this scenario, the gcCandidadeSet is empty, since "metadata.firstInProgressTransactionLocation" points to the first log file (MessageDatabase.java, lines 1140-1154): // Don't GC files after the first in progress tx Location firstTxLocation = metadata.lastUpdate; if ( metadata.firstInProgressTransactionLocation!= null ) { firstTxLocation = metadata.firstInProgressTransactionLocation; } if ( firstTxLocation!= null ) { while ( !gcCandidateSet.isEmpty() ) { Integer last = gcCandidateSet.last(); if ( last >= firstTxLocation.getDataFileId() ) { gcCandidateSet.remove(last); } else { break ; } } } This leads to the question: why it must discard every log file starting from the one containing the first pending transaction? Any malfunctioning operation (even an one-time little mistake) will trash all log files from that point! I believe this should happens also on "kill -9" or unexpected power down.
        Hide
        Gary Tully added a comment -

        The scenario that needs to be tested, for future reference:

        ackMessageFileMap = {2=[1], 3=[1, 2], 4=[2, 3, 4], 5=[4, 5], 6=[5, 6], 7=[6, 7], 8=[7, 8]}
        
        gCandidateSet = [2, 3, 4, 5, 6, 7]
        Show
        Gary Tully added a comment - The scenario that needs to be tested, for future reference: ackMessageFileMap = {2=[1], 3=[1, 2], 4=[2, 3, 4], 5=[4, 5], 6=[5, 6], 7=[6, 7], 8=[7, 8]} gCandidateSet = [2, 3, 4, 5, 6, 7]
        Hide
        Gary Tully added a comment -

        attaching the diff with the test case and the fix reverted, in which case the test should fail. but it does not atm.
        Just so we are on the same page, I may be missing something but I would expect it to fail.

        Show
        Gary Tully added a comment - attaching the diff with the test case and the fix reverted, in which case the test should fail. but it does not atm. Just so we are on the same page, I may be missing something but I would expect it to fail.
        Hide
        Jeff Genender added a comment -

        Test case attached

        Show
        Jeff Genender added a comment - Test case attached
        Hide
        Gary Tully added a comment -

        jeff, committed the fix, thanks. r1005806
        await the test case to protect it. nice work.

        Show
        Gary Tully added a comment - jeff, committed the fix, thanks. r1005806 await the test case to protect it. nice work.
        Hide
        Jeff Genender added a comment -

        My last comment was a red herring. I found the bug... it s a logic bug. Patch has been attached. I will try to create a unit test but wanted to get the patch up since this was a critical bug.

        Show
        Jeff Genender added a comment - My last comment was a red herring. I found the bug... it s a logic bug. Patch has been attached. I will try to create a unit test but wanted to get the patch up since this was a critical bug.
        Hide
        Jeff Genender added a comment -

        Oops... sorry for the multiple posts :/

        Show
        Jeff Genender added a comment - Oops... sorry for the multiple posts :/
        Hide
        Jeff Genender added a comment -

        I ran the attached example and was able to reproduce it and indeed appears to endlessly grow. I did a thread dump during the condition and there appears to be a possible locking problem with multiple "KahaDB Scheduler" threads.

        As an experiment I killed one of the KahaDB Schedulers and it seemed to clean up. I ran the example several times and it seems to clean up. So this may be the culprit.

        So please try this as a work around...

        In the broker configuration set schedulerSupport="false" and see if the files clean up. This may help until we identify the waits on multiple schedulers.

        Show
        Jeff Genender added a comment - I ran the attached example and was able to reproduce it and indeed appears to endlessly grow. I did a thread dump during the condition and there appears to be a possible locking problem with multiple "KahaDB Scheduler" threads. As an experiment I killed one of the KahaDB Schedulers and it seemed to clean up. I ran the example several times and it seems to clean up. So this may be the culprit. So please try this as a work around... In the broker configuration set schedulerSupport="false" and see if the files clean up. This may help until we identify the waits on multiple schedulers.
        Hide
        Jeff Genender added a comment -

        I ran the attached example and was able to reproduce it and indeed appears to endlessly grow. I did a thread dump during the condition and there appears to be a possible locking problem with multiple "KahaDB Scheduler" threads.

        As an experiment I killed one of the KahaDB Schedulers and it seemed to clean up. I ran the example several times and it seems to clean up. So this may be the culprit.

        So please try this as a work around...

        In the broker configuration set schedulerSupport="false" and see if the files clean up. This may help until we identify the waits on multiple schedulers.

        Show
        Jeff Genender added a comment - I ran the attached example and was able to reproduce it and indeed appears to endlessly grow. I did a thread dump during the condition and there appears to be a possible locking problem with multiple "KahaDB Scheduler" threads. As an experiment I killed one of the KahaDB Schedulers and it seemed to clean up. I ran the example several times and it seems to clean up. So this may be the culprit. So please try this as a work around... In the broker configuration set schedulerSupport="false" and see if the files clean up. This may help until we identify the waits on multiple schedulers.
        Hide
        Jeff Genender added a comment -

        Looks like there maybe a wait/lock condition on multiple "KahaDB Scheduler" threads...

        I took the attached example and was able to reproduce the problem. A thread dump showed waits on the "KahaDB Scheduler". I knocked out one of the schedulers and it appears to clean up.

        Hence... can you please try setting schedulerSupport="false" on the broker configuration and see if the files no longer grow? This may be a work around for the time being. Hopefully this is in the right direction...

        Show
        Jeff Genender added a comment - Looks like there maybe a wait/lock condition on multiple "KahaDB Scheduler" threads... I took the attached example and was able to reproduce the problem. A thread dump showed waits on the "KahaDB Scheduler". I knocked out one of the schedulers and it appears to clean up. Hence... can you please try setting schedulerSupport="false" on the broker configuration and see if the files no longer grow? This may be a work around for the time being. Hopefully this is in the right direction...
        Hide
        Eduardo Costa added a comment -

        I'm attaching a test case that reproduces what I got here.

        I tested with a Producer sending 1mi messages (in 100-message blocks) and a Consumer. After some time, I receive a "Usage Manager Store is Full, 100% of 1073741824. Stopping producer (...)" with only a hundred messages on the queue!

        Show
        Eduardo Costa added a comment - I'm attaching a test case that reproduces what I got here. I tested with a Producer sending 1mi messages (in 100-message blocks) and a Consumer. After some time, I receive a "Usage Manager Store is Full, 100% of 1073741824. Stopping producer (...)" with only a hundred messages on the queue!
        Hide
        Gary Tully added a comment -

        Straun, can you provide some sort of test case that can reproduce what you are seeing? Maybe a simple java producer/consumer that mimics what your .Net add does.

        Show
        Gary Tully added a comment - Straun, can you provide some sort of test case that can reproduce what you are seeing? Maybe a simple java producer/consumer that mimics what your .Net add does.
        Hide
        Straun added a comment -

        I am seeing this with a broker holding transactional persistent messages (100k messages a day 10k per msg), the queue sizes are all around 0 normally so I would expect the files to be cleared. I am using 5.4.0. The consumer of the messages is a .Net client with 'auto acknowledge' set. The KahaDB also appears to slow down as it runs, writes to the queues slows down so that the each commit cycle is taking 10% longer than the last.

        Show
        Straun added a comment - I am seeing this with a broker holding transactional persistent messages (100k messages a day 10k per msg), the queue sizes are all around 0 normally so I would expect the files to be cleared. I am using 5.4.0. The consumer of the messages is a .Net client with 'auto acknowledge' set. The KahaDB also appears to slow down as it runs, writes to the queues slows down so that the each commit cycle is taking 10% longer than the last.
        Hide
        Adrian Trenaman added a comment -

        Hi Gary,

        Just retested against the very latest FUSE distribution (5.3.1-fuse-01-00) and the log files are being deleted. Looks like this may have been fixed. Will mark this as resolved for now, and revisit if we begin to see the errors again.

        /Ade

        Show
        Adrian Trenaman added a comment - Hi Gary, Just retested against the very latest FUSE distribution (5.3.1-fuse-01-00) and the log files are being deleted. Looks like this may have been fixed. Will mark this as resolved for now, and revisit if we begin to see the errors again. /Ade
        Hide
        Adrian Trenaman added a comment -

        Hi Gary,

        I'm afraid I don't have a test case for this right now; just observed
        behaviour on our systems here and verified by other users via a Google
        Search.

        /Ade

        Show
        Adrian Trenaman added a comment - Hi Gary, I'm afraid I don't have a test case for this right now; just observed behaviour on our systems here and verified by other users via a Google Search. /Ade
        Hide
        Gary Tully added a comment -

        Could you provide a test case or scenario description that can reproduce?

        Show
        Gary Tully added a comment - Could you provide a test case or scenario description that can reproduce?

          People

          • Assignee:
            Gary Tully
            Reporter:
            Adrian Trenaman
          • Votes:
            12 Vote for this issue
            Watchers:
            25 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development