ActiveMQ
  1. ActiveMQ
  2. AMQ-3210

OutOfMemory error on ActiveMQ startup

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Incomplete
    • Affects Version/s: 5.4.2
    • Fix Version/s: 5.6.0
    • Component/s: Message Store
    • Labels:
      None
    • Environment:
      1. java -version
        java version "1.6.0_18"
        OpenJDK Runtime Environment (IcedTea6 1.8.3) (6b18-1.8.3-2~lenny1)
        OpenJDK Client VM (build 16.0-b13, mixed mode, sharing)
      1. cat /etc/debian_version
        5.0.8

      Description

      Probably due to some kind of message store corruption, when trying to start ActiveMQ, I get OutOfMemory errors and the startup simply fails.

      This can be solved by deleting /var/local/apache-activemq/kahadb, after which ActiveMQ starts with no issue.

      This issue doesn't always happen, and I'm not sure of a scenario that can reproduce this. I do have a corrupted kahadb directory that reproduces the problem.

      1. kahadb.tar.bz2
        532 kB
        Lior Okman
      2. exception.log
        116 kB
        Lior Okman
      3. activemq.xml
        5 kB
        Lior Okman

        Activity

        Hide
        Lior Okman added a comment -

        The full exception that is shown on startup.

        Show
        Lior Okman added a comment - The full exception that is shown on startup.
        Hide
        Lior Okman added a comment -

        The activemq.xml file being used

        Show
        Lior Okman added a comment - The activemq.xml file being used
        Hide
        Lior Okman added a comment -

        The corrupt KahaDB directory.

        In order to reproduce this issue, just start ActiveMQ with this KahaDB in /var/local/apache-activemq

        Show
        Lior Okman added a comment - The corrupt KahaDB directory. In order to reproduce this issue, just start ActiveMQ with this KahaDB in /var/local/apache-activemq
        Hide
        Sree Panchajanyam D added a comment -

        Please answer the above queries:
        Were you trying to restart the activeMQ server after a crash ? If so what were the conditions it crashed under?
        What was the heap space allocated to activeMQ ? Were there any persistent messages that were delivered when you tried to start activeMQ? If so please give the number of pending messages. As I see it huge number of persistent messages not delivered could be the issue. \

        Show
        Sree Panchajanyam D added a comment - Please answer the above queries: Were you trying to restart the activeMQ server after a crash ? If so what were the conditions it crashed under? What was the heap space allocated to activeMQ ? Were there any persistent messages that were delivered when you tried to start activeMQ? If so please give the number of pending messages. As I see it huge number of persistent messages not delivered could be the issue. \
        Hide
        Lior Okman added a comment -

        > Were you trying to restart the activeMQ server after a crash ?

        ActiveMQ is being run with the Tanuki Software wrapper as a daemon, with the wrapper.on_exit.default set to RESTART, so after the server crashes it is automatically being restarted.

        > If so what were the conditions it crashed under?

        The application can send around the order of a few hundred messages on 6 topics on startup. All the messages being sent are persistent, and there are 3 applications communicating via ActiveMQ, one written in Java and two written in C++ with the CMS driver downloaded from the project homepage. Everything is being run on the same host.

        > What was the heap space allocated to activeMQ ?

        ActiveMQ is started with -Xmx512m .

        > Were there any persistent messages that were delivered when you tried to start activeMQ? If so please give the number of pending messages.

        When these crashes occurred then most of the messages sent via ActiveMQ were persistent. Unfortunately, I don't know how to check how many messages are pending, since ActiveMQ doesn't restart and that prevents me from checking via the admin console.

        The application was modified to not send any persistent messages, and KahaDB was disabled via the ActiveMQ configuration. After these changes the crash hasn't reoccurred.

        > As I see it huge number of persistent messages not delivered could be the issue.

        Is there anything I can do to help resolve this issue?

        Show
        Lior Okman added a comment - > Were you trying to restart the activeMQ server after a crash ? ActiveMQ is being run with the Tanuki Software wrapper as a daemon, with the wrapper.on_exit.default set to RESTART, so after the server crashes it is automatically being restarted. > If so what were the conditions it crashed under? The application can send around the order of a few hundred messages on 6 topics on startup. All the messages being sent are persistent, and there are 3 applications communicating via ActiveMQ, one written in Java and two written in C++ with the CMS driver downloaded from the project homepage. Everything is being run on the same host. > What was the heap space allocated to activeMQ ? ActiveMQ is started with -Xmx512m . > Were there any persistent messages that were delivered when you tried to start activeMQ? If so please give the number of pending messages. When these crashes occurred then most of the messages sent via ActiveMQ were persistent. Unfortunately, I don't know how to check how many messages are pending, since ActiveMQ doesn't restart and that prevents me from checking via the admin console. The application was modified to not send any persistent messages, and KahaDB was disabled via the ActiveMQ configuration. After these changes the crash hasn't reoccurred. > As I see it huge number of persistent messages not delivered could be the issue. Is there anything I can do to help resolve this issue?
        Hide
        Sree Panchajanyam D added a comment -

        In this particular case kahaDB metadata got corrupted which is preventing the server to come up. I don't see any other way to resolve this issue as we do not have causes of the crash. In any case failure to update the metadata( *.redo file) before the server goes down is causing the issue.

        Show
        Sree Panchajanyam D added a comment - In this particular case kahaDB metadata got corrupted which is preventing the server to come up. I don't see any other way to resolve this issue as we do not have causes of the crash. In any case failure to update the metadata( *.redo file) before the server goes down is causing the issue.
        Hide
        Lior Okman added a comment -

        Is there any way to identify this corruption on startup and purge the database if it is identified, so that the ActiveMQ process can start without an operator having to manually delete the KahaDB files?

        Show
        Lior Okman added a comment - Is there any way to identify this corruption on startup and purge the database if it is identified, so that the ActiveMQ process can start without an operator having to manually delete the KahaDB files?
        Hide
        Sree Panchajanyam D added a comment - - edited

        Corrupt journal files can be identified but not corrupt metadata.
        You can ensure that the metadata is synced up regularly by setting the parameters "indexWriteBatchSize" and "checkpointInterval" to practically low values. Take a look at the documentation for these parameters at below links:
        http://activemq.apache.org/kahadb.html
        http://fusesource.com/docs/broker/5.5/persistence/index.html ( Optimizing the Metadata Cache)
        Metadata is not synced with the cache during server crashes.
        Hence, the best thing to do is to prevent ActiveMQ from crashing.
        I see that in your xml you have used producer flow control I would advocate against it if you are not sure why you need it.
        If you are using persistent messages use them with a time to live. Allocate store space with following caluculation "store space = no. of messages/second * avg message size * time to live * 2".
        For non-persistent messages the above calc. will not hold good.

        PS: in activemq.xml
        <persistenceAdapter>
        <kahaDB directory="$

        {activemq.base}

        /data/kahadb" checkForCorruptJournalFiles="true" checksumJournalFiles="true" indexWriteBatchSize="1000" checkpointInterval="1000"/>
        </persistenceAdapter>

        Show
        Sree Panchajanyam D added a comment - - edited Corrupt journal files can be identified but not corrupt metadata. You can ensure that the metadata is synced up regularly by setting the parameters "indexWriteBatchSize" and "checkpointInterval" to practically low values. Take a look at the documentation for these parameters at below links: http://activemq.apache.org/kahadb.html http://fusesource.com/docs/broker/5.5/persistence/index.html ( Optimizing the Metadata Cache) Metadata is not synced with the cache during server crashes. Hence, the best thing to do is to prevent ActiveMQ from crashing. I see that in your xml you have used producer flow control I would advocate against it if you are not sure why you need it. If you are using persistent messages use them with a time to live. Allocate store space with following caluculation "store space = no. of messages/second * avg message size * time to live * 2". For non-persistent messages the above calc. will not hold good. PS: in activemq.xml <persistenceAdapter> <kahaDB directory="$ {activemq.base} /data/kahadb" checkForCorruptJournalFiles="true" checksumJournalFiles="true" indexWriteBatchSize="1000" checkpointInterval="1000"/> </persistenceAdapter>
        Hide
        Lior Okman added a comment -

        Thanks, I'll try this configuration change.

        Show
        Lior Okman added a comment - Thanks, I'll try this configuration change.
        Hide
        Sree Panchajanyam D added a comment -

        Did the suggested changes solve the problem ?

        Show
        Sree Panchajanyam D added a comment - Did the suggested changes solve the problem ?
        Hide
        Lior Okman added a comment -

        Still checking.

        Since I don't have a scenario to recreate the issue and the issue doesn't always happen, I can't say for sure if the configuration change fixed it.

        Show
        Lior Okman added a comment - Still checking. Since I don't have a scenario to recreate the issue and the issue doesn't always happen, I can't say for sure if the configuration change fixed it.
        Hide
        Gary Tully added a comment -

        Marking this as incomplete as there is no test case and the suggested comments should help.
        If the index is out of sync w.r.t the journal, just deleting the index, db.data is sufficient to have the index rebuild automatically on restart, by replaying the journal.

        Show
        Gary Tully added a comment - Marking this as incomplete as there is no test case and the suggested comments should help. If the index is out of sync w.r.t the journal, just deleting the index, db.data is sufficient to have the index rebuild automatically on restart, by replaying the journal.

          People

          • Assignee:
            Unassigned
            Reporter:
            Lior Okman
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development