ActiveMQ
  1. ActiveMQ
  2. AMQ-2935

java.io.EOFException: Chunk stream does not exist at page on broker start

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 5.4.0, 5.4.1, 5.4.2
    • Fix Version/s: 5.4.2
    • Component/s: Broker
    • Labels:
      None
    • Environment:

      Win7 32bit, JDK 1.6_20

      Description

      I am seeing this regularly upon restarts in all versions from 5.4.x - I cannot downgrade due to breaking issues in previous versions.
      The broker was shutdown cleanly with no logged issues.
      Deleting the activemq-data directory seems to be the only recovery solution (which is not an option in production)

      2010-09-23 13:54:30,997 [Starting ActiveMQ Broker] ERROR org.apache.activemq.broker.BrokerService - Failed to start ActiveMQ JMS Message Broker. Reason: java.io.EOFException: Chunk stream does not exist at page: 0
      java.io.EOFException: Chunk stream does not exist at page: 0
      at org.apache.kahadb.page.Transaction$2.readPage(Transaction.java:454)
      at org.apache.kahadb.page.Transaction$2.<init>(Transaction.java:431)
      at org.apache.kahadb.page.Transaction.openInputStream(Transaction.java:428)
      at org.apache.kahadb.page.Transaction.load(Transaction.java:404)
      at org.apache.kahadb.page.Transaction.load(Transaction.java:361)
      at org.apache.activemq.broker.scheduler.JobSchedulerStore$3.execute(JobSchedulerStore.java:250)
      at org.apache.kahadb.page.Transaction.execute(Transaction.java:728)
      at org.apache.activemq.broker.scheduler.JobSchedulerStore.doStart(JobSchedulerStore.java:239)
      at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:53)
      at org.apache.activemq.broker.scheduler.SchedulerBroker.getStore(SchedulerBroker.java:198)
      at org.apache.activemq.broker.scheduler.SchedulerBroker.getInternalScheduler(SchedulerBroker.java:185)
      at org.apache.activemq.broker.scheduler.SchedulerBroker.start(SchedulerBroker.java:85)
      at org.apache.activemq.broker.BrokerFilter.start(BrokerFilter.java:157)
      at org.apache.activemq.broker.BrokerFilter.start(BrokerFilter.java:157)
      at org.apache.activemq.broker.TransactionBroker.start(TransactionBroker.java:112)
      at org.apache.activemq.broker.BrokerService$3.start(BrokerService.java:1786)
      at org.apache.activemq.broker.BrokerService.start(BrokerService.java:496)
      at org.apache.activemq.ra.ActiveMQResourceAdapter$1.run(ActiveMQResourceAdapter.java:85)

      1. activemq.xml
        1 kB
        Andy Gumbrecht
      2. activemq-data.zip
        41 kB
        Andy Gumbrecht
      3. stacktraces.txt
        4 kB
        Bryan Keller

        Issue Links

          Activity

          Hide
          Andy Gumbrecht added a comment -

          Here is my config and zipped activemq-data directory

          Show
          Andy Gumbrecht added a comment - Here is my config and zipped activemq-data directory
          Hide
          Eric added a comment -

          I upgraded from 5.3.2 to 5.4.1. Some jars (spring) were updated.

          If I delete all files in activemq-data/xxxx/scheduler and then restart the process. It's OK

          [sibModule@td0sib01s SIBBusModule-TestDeCharge-td0sib01s]$ ll scheduler
          total 36
          rw-rr- 1 sibModule sibdev 0 Sep 23 17:39 db-1.log
          rw-rr- 1 sibModule sibdev 0 Sep 23 17:39 lock
          rw-rr- 1 sibModule sibdev 16384 Sep 23 17:39 scheduleDB.data
          rw-rr- 1 sibModule sibdev 16408 Sep 23 17:39 scheduleDB.redo

          If I stop the process with a CTRL-C shutdown action and try to restart it without deleting the files, I have this error.

          [sibModule@td0sib01s scheduler]$ ll
          total 36
          rw-rr- 1 sibModule sibdev 0 Sep 23 17:39 db-1.log
          rw-rr- 1 sibModule sibdev 16384 Sep 23 17:44 scheduleDB.data
          rw-rr- 1 sibModule sibdev 16408 Sep 23 17:39 scheduleDB.redo

          I have a MemoryPersistance configuration.

          Show
          Eric added a comment - I upgraded from 5.3.2 to 5.4.1. Some jars (spring) were updated. If I delete all files in activemq-data/xxxx/scheduler and then restart the process. It's OK [sibModule@td0sib01s SIBBusModule-TestDeCharge-td0sib01s] $ ll scheduler total 36 rw-r r - 1 sibModule sibdev 0 Sep 23 17:39 db-1.log rw-r r - 1 sibModule sibdev 0 Sep 23 17:39 lock rw-r r - 1 sibModule sibdev 16384 Sep 23 17:39 scheduleDB.data rw-r r - 1 sibModule sibdev 16408 Sep 23 17:39 scheduleDB.redo If I stop the process with a CTRL-C shutdown action and try to restart it without deleting the files, I have this error. [sibModule@td0sib01s scheduler] $ ll total 36 rw-r r - 1 sibModule sibdev 0 Sep 23 17:39 db-1.log rw-r r - 1 sibModule sibdev 16384 Sep 23 17:44 scheduleDB.data rw-r r - 1 sibModule sibdev 16408 Sep 23 17:39 scheduleDB.redo I have a MemoryPersistance configuration.
          Hide
          Eric added a comment - - edited

          When I add
          schedulerSupport="false"

          in the "broker" configuration, It seems OK. No activemq-data directory nor files below are created.

          Example :

          <broker xmlns="http://activemq.apache.org/schema/core" brokerName="SIBBusModule-TestDeCharge-td0sib01s" useJmx="true" persistent="false" useShutdownHook="false" schedulerSupport="false">

          Show
          Eric added a comment - - edited When I add schedulerSupport="false" in the "broker" configuration, It seems OK. No activemq-data directory nor files below are created. Example : <broker xmlns="http://activemq.apache.org/schema/core" brokerName="SIBBusModule-TestDeCharge-td0sib01s" useJmx="true" persistent="false" useShutdownHook="false" schedulerSupport="false">
          Hide
          Andy Gumbrecht added a comment -

          Thanks Eric,

          Applying the property schedulerSupport="false" to the broker seems to have nipped this in the bud, at least for the few tests I have run.

          Where did you find the docs on this parameter? - All I can find is this: http://activemq.apache.org/delay-and-schedule-message-delivery.html

          As this was introduced recently (5.4.x) as new option I think 'false' should really be the default option - Especially as it seems to be broken.

          Show
          Andy Gumbrecht added a comment - Thanks Eric, Applying the property schedulerSupport="false" to the broker seems to have nipped this in the bud, at least for the few tests I have run. Where did you find the docs on this parameter? - All I can find is this: http://activemq.apache.org/delay-and-schedule-message-delivery.html As this was introduced recently (5.4.x) as new option I think 'false' should really be the default option - Especially as it seems to be broken.
          Hide
          Bryan Keller added a comment -

          Setting the schedulerSupport=false for me did reduce the likelihood of this error occurring for me. With it set to true, it happens nearly every single time I shut down and restart. However, I am still getting the problem intermittently after shutdown and restart, even with this set to false, albeit less frequently. I will attached stack traces of it occurring with both schedulerSupport=true and false (the error occurs in different places during initialization).

          Also, to note, this is a much more frequent occurrence with 5.4.1 vs 5.4.0. I do not have this issue with 5.3.2 so have downgraded to that for now.

          This occurs for me on both Windows 7 64-bit JDK 1.6.21 and Mac OS 10.6.4.

          Show
          Bryan Keller added a comment - Setting the schedulerSupport=false for me did reduce the likelihood of this error occurring for me. With it set to true, it happens nearly every single time I shut down and restart. However, I am still getting the problem intermittently after shutdown and restart, even with this set to false, albeit less frequently. I will attached stack traces of it occurring with both schedulerSupport=true and false (the error occurs in different places during initialization). Also, to note, this is a much more frequent occurrence with 5.4.1 vs 5.4.0. I do not have this issue with 5.3.2 so have downgraded to that for now. This occurs for me on both Windows 7 64-bit JDK 1.6.21 and Mac OS 10.6.4.
          Hide
          Bryan Keller added a comment -

          Stack traces of the issue, with schedulerSupport=true and with schedulerSupport=false

          Show
          Bryan Keller added a comment - Stack traces of the issue, with schedulerSupport=true and with schedulerSupport=false
          Hide
          shiweiyuan added a comment -

          I get the same issue when try activemq 5.4.1 on linux with jdk1.5. and I get around this issue by deleting all files/folders under data.
          But it will not be practical if it's a production environment.

          Show
          shiweiyuan added a comment - I get the same issue when try activemq 5.4.1 on linux with jdk1.5. and I get around this issue by deleting all files/folders under data. But it will not be practical if it's a production environment.
          Hide
          Andy Gumbrecht added a comment -

          Yes Bryan, you are right, setting schedulerSupport=false has only 'reduced' the likelihood of this occurring - This is still happening, but less frequently.

          There must be a race condition on shutdown which is corrupting the KahaDB files - I have looked at the startup code, but nothing seems to have changed much recently.

          The corruption is obviously more aggressive in the the sheduler code.

          This is a real show stopper for me right now, so I will dig more into this to see if I can find the problem.

          Show
          Andy Gumbrecht added a comment - Yes Bryan, you are right, setting schedulerSupport=false has only 'reduced' the likelihood of this occurring - This is still happening, but less frequently. There must be a race condition on shutdown which is corrupting the KahaDB files - I have looked at the startup code, but nothing seems to have changed much recently. The corruption is obviously more aggressive in the the sheduler code. This is a real show stopper for me right now, so I will dig more into this to see if I can find the problem.
          Hide
          Andy Gumbrecht added a comment -

          I have elevated this to 'Blocker' due to the fact that others are also experiencing this, and that the general solution for everyone is to physically delete files in order to continue - not an option in production.
          The bug is consistently repeatable and is destructive in that files are left in an unrecoverable state - The ability of ActiveMQ to recover from a crash is a feature, so the the ability to recover from a simple restart is ultimately critical.

          Show
          Andy Gumbrecht added a comment - I have elevated this to 'Blocker' due to the fact that others are also experiencing this, and that the general solution for everyone is to physically delete files in order to continue - not an option in production. The bug is consistently repeatable and is destructive in that files are left in an unrecoverable state - The ability of ActiveMQ to recover from a crash is a feature, so the the ability to recover from a simple restart is ultimately critical.
          Hide
          shiweiyuan added a comment -

          This issue maybe caused by lack of disk capacity.

          when using the default configuration, i.e. the pending queue policy will be store based message cursor.
          I found that the data folder size will increase very rapidly:
          300,000 messages, 30 bytes per message, non_persistent, the file named tmpDB.data under data\localhost\tmp_storage\ will up to 1.15G in my environment.
          so, pls check if your disk load/capacity is full.

          But I'm still experiencing this issue after make sure the disk capacity is not full.
          Will update on this issue if I get new evidence later...

          Show
          shiweiyuan added a comment - This issue maybe caused by lack of disk capacity. when using the default configuration, i.e. the pending queue policy will be store based message cursor. I found that the data folder size will increase very rapidly: 300,000 messages, 30 bytes per message, non_persistent, the file named tmpDB.data under data\localhost\tmp_storage\ will up to 1.15G in my environment. so, pls check if your disk load/capacity is full. But I'm still experiencing this issue after make sure the disk capacity is not full. Will update on this issue if I get new evidence later...
          Hide
          Andy Gumbrecht added a comment -

          Tested in several environments, of which at least one is a very high spec system, all with plenty of free space. Same results, so nothing to do with capacity.

          This seems to be pretty consistent, but even more so with schedulerSupport=true (occurs on practically every restart).
          The point of failure is easy to track down at org.apache.kahadb.page.Transaction$2.readPage(Transaction.java:454) - The question is more about why is the KahaDB file left in an unreadable state on a clean shutdown?

          Show
          Andy Gumbrecht added a comment - Tested in several environments, of which at least one is a very high spec system, all with plenty of free space. Same results, so nothing to do with capacity. This seems to be pretty consistent, but even more so with schedulerSupport=true (occurs on practically every restart). The point of failure is easy to track down at org.apache.kahadb.page.Transaction$2.readPage(Transaction.java:454) - The question is more about why is the KahaDB file left in an unreadable state on a clean shutdown?
          Hide
          Frank Gynnild added a comment -

          This is a blocker issue for me, and it seems to be unrelated to the amount of available disk space.

          Show
          Frank Gynnild added a comment - This is a blocker issue for me, and it seems to be unrelated to the amount of available disk space.
          Hide
          Adam Hoskins added a comment -

          This renders 5.4.1 unusable for me. Had to roll back to 5.3.2.

          Show
          Adam Hoskins added a comment - This renders 5.4.1 unusable for me. Had to roll back to 5.3.2.
          Hide
          Swapnonil Mukherjee added a comment - - edited

          The reason I chose Active MQ 5.4 is because is of it's "Message Scheduling" feature, which is not there in 5.3. We are using the message scheduling feature released with 5.4. Our application needs messages to sit on a queue for at least 30 seconds.
          I am using the Spring JMS Template to send messages. This is how I am sending messages.


          public Message createMessage(Session session) throws JMSException
          {
              Date date = new Date();
              String delayInSeconds = properties.getProperty(MESSAGE_DELAY);
              Message message = session.createObjectMessage(mqRequest);
              message.setLongProperty(TIMESTAMP_AS_EPOCH_ATTRIBUTE, date.getTime());
              message.setStringProperty(TIMESTAMP_AS_STRING_ATTRIBUTE, getDateAsString(date));
              if (delayInSeconds != null)
              {
                  LOGGER.info("Delay set at " + delayInSeconds + " seconds");
                  message.setLongProperty(ScheduledMessage.AMQ_SCHEDULED_DELAY, Integer.parseInt(delayInSeconds) * 1000);
              }
              return message;
          }
          
          

          I can downgrade to 5.3 but then how do I get message scheduling. Please advice.

          I have checked disk capacity. There's no problem with that.
          By the way the stack trace I get is exactly what Andy posted.

          I just deleted all the contents of localhost/scheduler directory, and started the broker again. This time it starts ok and is working fine.

          Show
          Swapnonil Mukherjee added a comment - - edited The reason I chose Active MQ 5.4 is because is of it's "Message Scheduling" feature, which is not there in 5.3. We are using the message scheduling feature released with 5.4. Our application needs messages to sit on a queue for at least 30 seconds. I am using the Spring JMS Template to send messages. This is how I am sending messages. public Message createMessage(Session session) throws JMSException { Date date = new Date(); String delayInSeconds = properties.getProperty(MESSAGE_DELAY); Message message = session.createObjectMessage(mqRequest); message.setLongProperty(TIMESTAMP_AS_EPOCH_ATTRIBUTE, date.getTime()); message.setStringProperty(TIMESTAMP_AS_STRING_ATTRIBUTE, getDateAsString(date)); if (delayInSeconds != null ) { LOGGER.info( "Delay set at " + delayInSeconds + " seconds" ); message.setLongProperty(ScheduledMessage.AMQ_SCHEDULED_DELAY, Integer .parseInt(delayInSeconds) * 1000); } return message; } I can downgrade to 5.3 but then how do I get message scheduling. Please advice. I have checked disk capacity. There's no problem with that. By the way the stack trace I get is exactly what Andy posted. I just deleted all the contents of localhost/scheduler directory, and started the broker again. This time it starts ok and is working fine.
          Hide
          Gary Tully added a comment -
          useShutdownHook="false"

          is relevant. With a simple embedded broker Ctrl+C and restart without the shutdown hook, the stores are not stopped, in fact the broker is not stopped. A restart results in this problem every time. The store needs to be flushed on start which will should help but requires some code changes. In addition, this exception should be trapped such that it leads to recovery.

          To work around, either

          useShutdownHook="true"

          needs to be used such that the broker is stopped when the jvm is shutdown or there needs to be an explicit call to broker.stop()

          Show
          Gary Tully added a comment - useShutdownHook= " false " is relevant. With a simple embedded broker Ctrl+C and restart without the shutdown hook, the stores are not stopped, in fact the broker is not stopped. A restart results in this problem every time. The store needs to be flushed on start which will should help but requires some code changes. In addition, this exception should be trapped such that it leads to recovery. To work around, either useShutdownHook= " true " needs to be used such that the broker is stopped when the jvm is shutdown or there needs to be an explicit call to broker.stop()
          Hide
          Eric added a comment -

          Hi Gary

          Do you confirm us that "schedulerSupport=false" is enough when we don't plan to use sheduled messages and don't want to have any files written on disk ?

          Eric

          Show
          Eric added a comment - Hi Gary Do you confirm us that "schedulerSupport=false" is enough when we don't plan to use sheduled messages and don't want to have any files written on disk ? Eric
          Hide
          Krzysztof Olszewski added a comment -

          In our configuration we have [and had] useShutdownHook="true", also we do not use db (only memory).

          broker.setPersistenceAdapter(new MemoryPersistenceAdapter());
          broker.setPersistent(false);
          broker.setUseShutdownHook(true);
          broker.setUseLoggingForShutdownErrors(false);

          If we do not use: "broker.setSchedulerSupport(false);" the problem is 100% reproducible.

          Although we were not able to reproduce it with a scheduler being off, we would not take a risk, as someone mentioned above, this setting only reduce the probability of the problem.

          Show
          Krzysztof Olszewski added a comment - In our configuration we have [and had] useShutdownHook="true", also we do not use db (only memory). broker.setPersistenceAdapter(new MemoryPersistenceAdapter()); broker.setPersistent(false); broker.setUseShutdownHook(true); broker.setUseLoggingForShutdownErrors(false); If we do not use: "broker.setSchedulerSupport(false);" the problem is 100% reproducible. Although we were not able to reproduce it with a scheduler being off, we would not take a risk, as someone mentioned above, this setting only reduce the probability of the problem.
          Hide
          Andy Gumbrecht added a comment -

          Eric,

          Only using "schedulerSupport=false" does NOT resolve this issue. The problem is specifically a KahaDB problem (the default persistence store is KahaDB).

          Swapping out the persistence store is currently the only option. If you need 'any' persistence then I would suggest the jdbcPersistenceAdapter (the default uses a Derby Database, but many can be used).
          This is substantially slower than KahaDB, but not enough to worry about if you are only pushing say several thousand messages a minute (you'd have to run your own tests). I am in fact happy enough with the performance that I am likely to stick with it for stabilities sake even if this issue is resolved.

          <broker xmlns="http://activemq.apache.org/schema/core"
          useJmx="false"
          brokerName="YourName"
          useShutdownHook="false"
          persistent="true"
          start="false"
          schedulerSupport="false"
          enableStatistics="false">

          <persistenceAdapter>
          <jdbcPersistenceAdapter dataDirectory="activemq-data/jdbc"/>
          </persistenceAdapter>

          .....

          Note: If schedulerSupport is enabled then the error will still persist (excuse the pun) due to the fact that the scheduler uses KahaDB irrelevant of the persistenceAdapter - So if you want scheduler support then there is currently no option (unless someone knows how to configure the scheduler to use another store?).

          Gary,

          I should have mentioned that my configuration is for an OpenEJB RA, whereby the RA is responsible for starting and stopping ActiveMQ (which is does, and still produces a corrupt KahaDB) - The default OpenEJB RA config uses a memory store, but I require persistence for my project. A standalone application (basically as Krzysztof writes) which loops through start/stop will still produce the same stacktrace virtually every time.

          Show
          Andy Gumbrecht added a comment - Eric, Only using "schedulerSupport=false" does NOT resolve this issue. The problem is specifically a KahaDB problem (the default persistence store is KahaDB). Swapping out the persistence store is currently the only option. If you need 'any' persistence then I would suggest the jdbcPersistenceAdapter (the default uses a Derby Database, but many can be used). This is substantially slower than KahaDB, but not enough to worry about if you are only pushing say several thousand messages a minute (you'd have to run your own tests). I am in fact happy enough with the performance that I am likely to stick with it for stabilities sake even if this issue is resolved. <broker xmlns="http://activemq.apache.org/schema/core" useJmx="false" brokerName="YourName" useShutdownHook="false" persistent="true" start="false" schedulerSupport="false" enableStatistics="false"> <persistenceAdapter> <jdbcPersistenceAdapter dataDirectory="activemq-data/jdbc"/> </persistenceAdapter> ..... Note: If schedulerSupport is enabled then the error will still persist (excuse the pun) due to the fact that the scheduler uses KahaDB irrelevant of the persistenceAdapter - So if you want scheduler support then there is currently no option (unless someone knows how to configure the scheduler to use another store?). Gary, I should have mentioned that my configuration is for an OpenEJB RA, whereby the RA is responsible for starting and stopping ActiveMQ (which is does, and still produces a corrupt KahaDB) - The default OpenEJB RA config uses a memory store, but I require persistence for my project. A standalone application (basically as Krzysztof writes) which loops through start/stop will still produce the same stacktrace virtually every time.
          Hide
          Eric added a comment -

          Hi Andy

          I don't need nor scheduled Messages, nor Persistance. So I don't need KahaDB usage at all.

          I wonder if, in this case, "schedulerSupport=false" is enough.

          Eric

          Show
          Eric added a comment - Hi Andy I don't need nor scheduled Messages, nor Persistance. So I don't need KahaDB usage at all. I wonder if, in this case, "schedulerSupport=false" is enough. Eric
          Hide
          Gary Tully added a comment -

          I have found a problem with the recovery processing in the kahaDB pageFile/index but the issue only occurs when there has been an abortive close, as in the stop method of the broker was not called or did not complete or a file write did not complete with the clean shutdown flag set.
          With a successful call to broker.stop() the problem is avoided as there is no recovery of the pageFile from the redo buffer at restart so it may not be the only issue here.

          The latest 5.5-SNAPSHOT from maven or the repo has the fix. It would be great if you could validate.
          https://repository.apache.org/content/repositories/snapshots/org/apache/activemq/apache-activemq/5.5-SNAPSHOT/

          Andy, Krzysztof, Can you verify that the broker is actually stopped in your start/stop tests.There should be messages in the log of the form: 10:55:37,927 [main ] INFO JobSchedulerStore - JobSchedulerStore:activemq-data/localhost/scheduler stopped

          Eric, for your use case, schedulerSupport=false is sufficient.

          Show
          Gary Tully added a comment - I have found a problem with the recovery processing in the kahaDB pageFile/index but the issue only occurs when there has been an abortive close, as in the stop method of the broker was not called or did not complete or a file write did not complete with the clean shutdown flag set. With a successful call to broker.stop() the problem is avoided as there is no recovery of the pageFile from the redo buffer at restart so it may not be the only issue here. The latest 5.5-SNAPSHOT from maven or the repo has the fix. It would be great if you could validate. https://repository.apache.org/content/repositories/snapshots/org/apache/activemq/apache-activemq/5.5-SNAPSHOT/ Andy, Krzysztof, Can you verify that the broker is actually stopped in your start/stop tests.There should be messages in the log of the form: 10:55:37,927 [main ] INFO JobSchedulerStore - JobSchedulerStore:activemq-data/localhost/scheduler stopped Eric, for your use case, schedulerSupport=false is sufficient.
          Hide
          Nik Gehring added a comment -

          Gary - thanks for looking in to this. We are going to test that it solves the issue we're seeing.

          In the meantime a quick question please.... as we are running this in the background in Unix, how do we stop the broker gracefully? The FAQ simply says to kill the process and I presume that this will not call broker.stop() correctly.

          Show
          Nik Gehring added a comment - Gary - thanks for looking in to this. We are going to test that it solves the issue we're seeing. In the meantime a quick question please.... as we are running this in the background in Unix, how do we stop the broker gracefully? The FAQ simply says to kill the process and I presume that this will not call broker.stop() correctly.
          Hide
          Krzysztof Olszewski added a comment -

          Gary, you are right.
          After deleting activemq-data I verified following situation:
          until I abnormally quit the application [and thus the broker.stop() is not invoked] everything starts OK
          only after a crash/abnormal ending of an application the problem appears [in 100% cases] (deleting the data or setting schedulerSupport to false resolve the issue).

          Show
          Krzysztof Olszewski added a comment - Gary, you are right. After deleting activemq-data I verified following situation: until I abnormally quit the application [and thus the broker.stop() is not invoked] everything starts OK only after a crash/abnormal ending of an application the problem appears [in 100% cases] (deleting the data or setting schedulerSupport to false resolve the issue).
          Hide
          Gary Tully added a comment -

          Nik, so long as you don't use kill -9, and use one of the interrupt signals SIGHUP (Unix Only), SIGINT, or SIGTERM instead, the jvm interruption handler will have a chance to kick in and call stop via the shutdownHooks registered by the broker.

          But you can also use ./bin/activemq stop on unix, see : http://activemq.apache.org/unix-shell-script.html
          I guess activemq stop is the preferred approach, it will try via jmx first and then revert to kill after some timeout.

          Show
          Gary Tully added a comment - Nik, so long as you don't use kill -9, and use one of the interrupt signals SIGHUP (Unix Only), SIGINT, or SIGTERM instead, the jvm interruption handler will have a chance to kick in and call stop via the shutdownHooks registered by the broker. But you can also use ./bin/activemq stop on unix, see : http://activemq.apache.org/unix-shell-script.html I guess activemq stop is the preferred approach, it will try via jmx first and then revert to kill after some timeout.
          Hide
          Swapnonil Mukherjee added a comment -

          Hi Gary,

          I can indeed confirm that in our case the problem started only after someone killed the activemq process using kill -9. As I reported earlier, when I deleted everything under localhost/scheduler folder and restarted activemq it started cleanly.

          With regards to your suggestion of testing 5.5 Snapshot; Are you saying that with 5.5 activemq can recover correctly from these kill -9 type shutdowns and we won't see this "Chunk Stream" errors?

          Show
          Swapnonil Mukherjee added a comment - Hi Gary, I can indeed confirm that in our case the problem started only after someone killed the activemq process using kill -9. As I reported earlier, when I deleted everything under localhost/scheduler folder and restarted activemq it started cleanly. With regards to your suggestion of testing 5.5 Snapshot; Are you saying that with 5.5 activemq can recover correctly from these kill -9 type shutdowns and we won't see this "Chunk Stream" errors?
          Hide
          Gary Tully added a comment -

          yes. it would be great if you could validate the fix.

          Show
          Gary Tully added a comment - yes. it would be great if you could validate the fix.
          Hide
          Andy Gumbrecht added a comment -

          Gary,

          Many thanks, the fix works so far. I am still in the process of further testing , but a corrupted KahaDB is now properly recovered on restart.

          I think in my specific RA case that 'not' calling 'waitUntilStopped()' in the RA may have caused the corruption - If the system exits too early then this would be the culprit.

          I'll let you know if and when I find something.

          Show
          Andy Gumbrecht added a comment - Gary, Many thanks, the fix works so far. I am still in the process of further testing , but a corrupted KahaDB is now properly recovered on restart. I think in my specific RA case that 'not' calling 'waitUntilStopped()' in the RA may have caused the corruption - If the system exits too early then this would be the culprit. I'll let you know if and when I find something.
          Hide
          Gary Tully added a comment -

          thanks for the validation andy, marking resolved.

          Show
          Gary Tully added a comment - thanks for the validation andy, marking resolved.
          Hide
          Frank Gynnild added a comment -

          Great that you found and fixed this issue! What is the strategy of back porting critical and blocker issues like this? The severity of this bug leaves 5.4.* useless in a production environment, but it is still announced as the stable one.

          Show
          Frank Gynnild added a comment - Great that you found and fixed this issue! What is the strategy of back porting critical and blocker issues like this? The severity of this bug leaves 5.4.* useless in a production environment, but it is still announced as the stable one.
          Hide
          Eduardo Zanni added a comment -

          I agree with Frank. This issue has been affecting us too (we're using Fuse broker v. 5.4, which in turn relies on activeMQ 5.4). Is there any prevision regarding a "patched" release for version 5.4 with Gary's fix that can be used safely in a production environment?

          Show
          Eduardo Zanni added a comment - I agree with Frank. This issue has been affecting us too (we're using Fuse broker v. 5.4, which in turn relies on activeMQ 5.4). Is there any prevision regarding a "patched" release for version 5.4 with Gary's fix that can be used safely in a production environment?
          Hide
          Thomas Dudziak added a comment - - edited

          FWIW, I get this issue on first write into a new KahaDB store, as well, though not consistently. In this particular case, ActiveMQ is used in a unit test and KahaDB is configured to write to a new temporary directory. The startup code looks like this:

          File tmpDir = File.createTempFile("activmq", "");

          tmpDir.delete();
          tmpDir.mkdir();

          KahaDBStore kaha = new KahaDBStore();

          kaha.setDirectory(localDbDir);

          BrokerService broker = new BrokerService();

          broker.setUseShutdownHook(true);
          broker.setPersistenceAdapter(kaha);
          broker.addConnector("vm://localhost:12345" );
          broker.start();

          Then the very first message (jms producer using via a vm connection) fails with this error:

          VMTransport org.apache.activemq.broker.TransportConnection.Transport Transport failed: java.io.EOFException: Chunk stream does not exist at page: 0
          java.io.EOFException: Chunk stream does not exist at page: 0
          at org.apache.kahadb.page.Transaction$2.readPage(Transaction.java:454)
          at org.apache.kahadb.page.Transaction$2.<init>(Transaction.java:431)
          at org.apache.kahadb.page.Transaction.openInputStream(Transaction.java:428)
          at org.apache.kahadb.page.Transaction.load(Transaction.java:404)
          at org.apache.kahadb.page.Transaction.load(Transaction.java:361)
          at org.apache.activemq.broker.scheduler.JobSchedulerStore$3.execute(JobSchedulerStore.java:250)
          at org.apache.kahadb.page.Transaction.execute(Transaction.java:728)
          at org.apache.activemq.broker.scheduler.JobSchedulerStore.doStart(JobSchedulerStore.java:239)
          at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:53)
          at org.apache.activemq.broker.scheduler.SchedulerBroker.getStore(SchedulerBroker.java:197)
          at org.apache.activemq.broker.scheduler.SchedulerBroker.getInternalScheduler(SchedulerBroker.java:184)
          at org.apache.activemq.broker.scheduler.SchedulerBroker.send(SchedulerBroker.java:131)
          at org.apache.activemq.broker.BrokerFilter.send(BrokerFilter.java:129)
          at org.apache.activemq.broker.CompositeDestinationBroker.send(CompositeDestinationBroker.java:96)
          at org.apache.activemq.broker.TransactionBroker.send(TransactionBroker.java:230)
          at org.apache.activemq.broker.MutableBrokerFilter.send(MutableBrokerFilter.java:135)
          at org.apache.activemq.broker.TransportConnection.processMessage(TransportConnection.java:460)
          at org.apache.activemq.command.ActiveMQMessage.visit(ActiveMQMessage.java:663)
          at org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:309)
          at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:185)
          at org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:116)
          at org.apache.activemq.transport.TransportFilter.onCommand(TransportFilter.java:69)
          at org.apache.activemq.transport.vm.VMTransport.iterate(VMTransport.java:218)
          at org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:122)
          at org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:43)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
          at java.lang.Thread.run(Thread.java:619)

          Show
          Thomas Dudziak added a comment - - edited FWIW, I get this issue on first write into a new KahaDB store, as well, though not consistently. In this particular case, ActiveMQ is used in a unit test and KahaDB is configured to write to a new temporary directory. The startup code looks like this: File tmpDir = File.createTempFile("activmq", ""); tmpDir.delete(); tmpDir.mkdir(); KahaDBStore kaha = new KahaDBStore(); kaha.setDirectory(localDbDir); BrokerService broker = new BrokerService(); broker.setUseShutdownHook(true); broker.setPersistenceAdapter(kaha); broker.addConnector("vm://localhost:12345" ); broker.start(); Then the very first message (jms producer using via a vm connection) fails with this error: VMTransport org.apache.activemq.broker.TransportConnection.Transport Transport failed: java.io.EOFException: Chunk stream does not exist at page: 0 java.io.EOFException: Chunk stream does not exist at page: 0 at org.apache.kahadb.page.Transaction$2.readPage(Transaction.java:454) at org.apache.kahadb.page.Transaction$2.<init>(Transaction.java:431) at org.apache.kahadb.page.Transaction.openInputStream(Transaction.java:428) at org.apache.kahadb.page.Transaction.load(Transaction.java:404) at org.apache.kahadb.page.Transaction.load(Transaction.java:361) at org.apache.activemq.broker.scheduler.JobSchedulerStore$3.execute(JobSchedulerStore.java:250) at org.apache.kahadb.page.Transaction.execute(Transaction.java:728) at org.apache.activemq.broker.scheduler.JobSchedulerStore.doStart(JobSchedulerStore.java:239) at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:53) at org.apache.activemq.broker.scheduler.SchedulerBroker.getStore(SchedulerBroker.java:197) at org.apache.activemq.broker.scheduler.SchedulerBroker.getInternalScheduler(SchedulerBroker.java:184) at org.apache.activemq.broker.scheduler.SchedulerBroker.send(SchedulerBroker.java:131) at org.apache.activemq.broker.BrokerFilter.send(BrokerFilter.java:129) at org.apache.activemq.broker.CompositeDestinationBroker.send(CompositeDestinationBroker.java:96) at org.apache.activemq.broker.TransactionBroker.send(TransactionBroker.java:230) at org.apache.activemq.broker.MutableBrokerFilter.send(MutableBrokerFilter.java:135) at org.apache.activemq.broker.TransportConnection.processMessage(TransportConnection.java:460) at org.apache.activemq.command.ActiveMQMessage.visit(ActiveMQMessage.java:663) at org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:309) at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:185) at org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:116) at org.apache.activemq.transport.TransportFilter.onCommand(TransportFilter.java:69) at org.apache.activemq.transport.vm.VMTransport.iterate(VMTransport.java:218) at org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:122) at org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:43) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619)
          Hide
          Swapnonil Mukherjee added a comment -

          Hi Gary,

          We verified the fix. It works as expected.

          However since there is no talk of merging this fix back to 5.4 we are going into production from Monday onwards with 5.5 Snapshot.

          We thank you for applying this fix.

          Show
          Swapnonil Mukherjee added a comment - Hi Gary, We verified the fix. It works as expected. However since there is no talk of merging this fix back to 5.4 we are going into production from Monday onwards with 5.5 Snapshot. We thank you for applying this fix.
          Hide
          Frank Gynnild added a comment -

          I also tried the fix, and I no longer got the error message. But not all queues are working afterwards. Some queues work as expected, but for some of the other queues nothing gets into them. And there is no error messages either, neither on the server or the client side. Does anyone else experience this behavior?

          Show
          Frank Gynnild added a comment - I also tried the fix, and I no longer got the error message. But not all queues are working afterwards. Some queues work as expected, but for some of the other queues nothing gets into them. And there is no error messages either, neither on the server or the client side. Does anyone else experience this behavior?
          Hide
          Gary Tully added a comment -

          Frank, can you try an reproduce with trace level logging?

          Show
          Gary Tully added a comment - Frank, can you try an reproduce with trace level logging?
          Hide
          Norman Maurer added a comment -

          We saw the same exception in JAMES. IMHO a backport of the fix is a must because its really unacceptable to only be able to start again when purge all data. Thats just a no-go for a production service..

          Show
          Norman Maurer added a comment - We saw the same exception in JAMES. IMHO a backport of the fix is a must because its really unacceptable to only be able to start again when purge all data. Thats just a no-go for a production service..
          Hide
          Eduardo Zanni added a comment -

          Any news on the backport possibility for this issue? We'd like having the scheduled/delayed message delivery available, so reverting to version 5.3 wouldn't be so much of a nice option for us here.

          Show
          Eduardo Zanni added a comment - Any news on the backport possibility for this issue? We'd like having the scheduled/delayed message delivery available, so reverting to version 5.3 wouldn't be so much of a nice option for us here.
          Hide
          Gary Tully added a comment -

          we will drop a new release 5.4.2 once we get completion on a few more outstanding issues.

          Show
          Gary Tully added a comment - we will drop a new release 5.4.2 once we get completion on a few more outstanding issues.
          Hide
          jns added a comment -

          Gary,
          Activemq 5.5 snapshot fix worked. Thanks for the fix.
          Can you provide us with an approximate 5.4.2 release date....Just a rough estimate would be fine.

          Show
          jns added a comment - Gary, Activemq 5.5 snapshot fix worked. Thanks for the fix. Can you provide us with an approximate 5.4.2 release date....Just a rough estimate would be fine.
          Hide
          Richard Bonneau added a comment -

          As requested by Eduardo Zanni above, can anyone supply a workaround if using 5.4.0?

          Show
          Richard Bonneau added a comment - As requested by Eduardo Zanni above, can anyone supply a workaround if using 5.4.0?
          Hide
          Patrick Monfette added a comment -

          We are also looking for either:

          • A workaround for 5.4.1
          • A 5.4.2 version that includes this fix

          We need to move to 5.4.x from 5.3.2 but cannot do so because of this critical issue.

          Show
          Patrick Monfette added a comment - We are also looking for either: A workaround for 5.4.1 A 5.4.2 version that includes this fix We need to move to 5.4.x from 5.3.2 but cannot do so because of this critical issue.
          Hide
          Oleg Kozlov added a comment -

          Could one of the Activemq developers provide a date for the next release that will include a fix for this issue? Whatever it will be.. 5.4.2 or 5.5.0.

          We are in the final stage of developing an large system that will go to production in beginning of December and we are experiencing this issue in staging (using 5.4.1). It would be really nice to get a fix before Dec.1 If it's not possible - is there a patch for 5.4.1 or a workaround?

          Really appreciate!

          Regards,
          Oleg.

          Show
          Oleg Kozlov added a comment - Could one of the Activemq developers provide a date for the next release that will include a fix for this issue? Whatever it will be.. 5.4.2 or 5.5.0. We are in the final stage of developing an large system that will go to production in beginning of December and we are experiencing this issue in staging (using 5.4.1). It would be really nice to get a fix before Dec.1 If it's not possible - is there a patch for 5.4.1 or a workaround? Really appreciate! Regards, Oleg.
          Hide
          Sebastien Rodriguez added a comment -

          @Oleg: The issue is resolved in the trunk, you can always pickup the latest sourcecode there and compile, or grab the nightly builds.
          But there is a discussion in the mailing list, there should be a 5.4.2 soon.

          Show
          Sebastien Rodriguez added a comment - @Oleg: The issue is resolved in the trunk, you can always pickup the latest sourcecode there and compile, or grab the nightly builds. But there is a discussion in the mailing list, there should be a 5.4.2 soon.
          Hide
          Randy Palmer added a comment -

          It seems like schedulerSupport defaulted to true in 5.4.1. It appears to default to false in 5.4.2 so you may need to explicitly set schedulerSupport=true on your broker when you update from 5.4.1. to 5.4.2 if it's not already in your config file.

          Show
          Randy Palmer added a comment - It seems like schedulerSupport defaulted to true in 5.4.1. It appears to default to false in 5.4.2 so you may need to explicitly set schedulerSupport=true on your broker when you update from 5.4.1. to 5.4.2 if it's not already in your config file.

            People

            • Assignee:
              Gary Tully
              Reporter:
              Andy Gumbrecht
            • Votes:
              17 Vote for this issue
              Watchers:
              36 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development