Uploaded image for project: 'ActiveMQ Classic'
  1. ActiveMQ Classic
  2. AMQ-4339

Corrupt KahaDB Journal may cause EOFException at Broker startup

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Abandoned
    • 5.8.0
    • None
    • Message Store
    • Java 1.6.0 most releases, tested on lots of different hardware, tested on Linux distros only.

    Description

      During the occasional KahaDB journal corruption, ActiveMQ may receive an EOFException. The easiest way to replicate this for us has been to purposely put junk into a journal as so:

      echo "asdf" > db-1.log

      The exception in this case has been (this error is specific to AMQ 5.6.0, so lines may mismatch on 5.8.0, but similar problem was confirmed on 5.8.0):

      2013-02-13 11:35:27,465 ERROR [main] [broker.BrokerService] Failed to start Apache ActiveMQ (localhost, null). Reason: java.io.EOFException
      java.io.EOFException
          at java.io.RandomAccessFile.readInt(RandomAccessFile.java:776)
          at org.apache.activemq.store.kahadb.disk.journal.DataFileAccessor.readRecord(DataFileAccessor.java:81)
          at org.apache.activemq.store.kahadb.disk.journal.Journal.read(Journal.java:604)
          at org.apache.activemq.store.kahadb.MessageDatabase.load(MessageDatabase.java:961)
          at org.apache.activemq.store.kahadb.MessageDatabase.recoverProducerAudit(MessageDatabase.java:629)
          at org.apache.activemq.store.kahadb.MessageDatabase.recover(MessageDatabase.java:555)
          at org.apache.activemq.store.kahadb.MessageDatabase.open(MessageDatabase.java:369)
          at org.apache.activemq.store.kahadb.MessageDatabase.load(MessageDatabase.java:387)
          at org.apache.activemq.store.kahadb.MessageDatabase.doStart(MessageDatabase.java:240)
          at org.apache.activemq.store.kahadb.KahaDBStore.doStart(KahaDBStore.java:180)
          at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)
          at org.apache.activemq.store.kahadb.KahaDBPersistenceAdapter.doStart(KahaDBPersistenceAdapter.java:220)
          at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)
          at org.apache.activemq.broker.BrokerService.doStartPersistenceAdapter(BrokerService.java:598)
          at org.apache.activemq.broker.BrokerService.startPersistenceAdapter(BrokerService.java:587)
          at org.apache.activemq.broker.BrokerService.start(BrokerService.java:552)
          at com.puppetlabs.mq$start_broker_BANG_.invoke(mq.clj:112)
          at com.puppetlabs.puppetdb.cli.services$_main.doInvoke(services.clj:374)
          at clojure.lang.RestFn.invoke(RestFn.java:421)
          at clojure.lang.Var.invoke(Var.java:419)
          at clojure.lang.AFn.applyToHelper(AFn.java:163)
          at clojure.lang.Var.applyTo(Var.java:532)
          at clojure.core$apply.invoke(core.clj:601)
          at com.puppetlabs.puppetdb.core$_main.doInvoke(core.clj:79)
          at clojure.lang.RestFn.applyTo(RestFn.java:137)
          at com.puppetlabs.puppetdb.core.main(Unknown Source)
      2013-02-13 11:35:27,983 ERROR [main] [puppetlabs.utils] Uncaught exception
      java.io.EOFException
          at java.io.RandomAccessFile.readInt(RandomAccessFile.java:776)
          at org.apache.activemq.store.kahadb.disk.journal.DataFileAccessor.readRecord(DataFileAccessor.java:81)
          at org.apache.activemq.store.kahadb.disk.journal.Journal.read(Journal.java:604)
          at org.apache.activemq.store.kahadb.MessageDatabase.load(MessageDatabase.java:961)
          at org.apache.activemq.store.kahadb.MessageDatabase.recoverProducerAudit(MessageDatabase.java:629)
          at org.apache.activemq.store.kahadb.MessageDatabase.recover(MessageDatabase.java:555)
          at org.apache.activemq.store.kahadb.MessageDatabase.open(MessageDatabase.java:369)
          at org.apache.activemq.store.kahadb.MessageDatabase.load(MessageDatabase.java:387)
          at org.apache.activemq.store.kahadb.MessageDatabase.doStart(MessageDatabase.java:240)
          at org.apache.activemq.store.kahadb.KahaDBStore.doStart(KahaDBStore.java:180)
          at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)
          at org.apache.activemq.store.kahadb.KahaDBPersistenceAdapter.doStart(KahaDBPersistenceAdapter.java:220)
          at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)
          at org.apache.activemq.broker.BrokerService.doStartPersistenceAdapter(BrokerService.java:598)
          at org.apache.activemq.broker.BrokerService.startPersistenceAdapter(BrokerService.java:587)
          at org.apache.activemq.broker.BrokerService.start(BrokerService.java:552)
          at com.puppetlabs.mq$start_broker_BANG_.invoke(mq.clj:112)
          at com.puppetlabs.puppetdb.cli.services$_main.doInvoke(services.clj:374)
          at clojure.lang.RestFn.invoke(RestFn.java:421)
          at clojure.lang.Var.invoke(Var.java:419)
          at clojure.lang.AFn.applyToHelper(AFn.java:163)
          at clojure.lang.Var.applyTo(Var.java:532)
          at clojure.core$apply.invoke(core.clj:601)
          at com.puppetlabs.puppetdb.core$_main.doInvoke(core.clj:79)
          at clojure.lang.RestFn.applyTo(RestFn.java:137)
          at com.puppetlabs.puppetdb.core.main(Unknown Source)
      

      What is strange about this problem, is that upon a restart of the broker the journal is 'reset' it would seem and things go fine afterwards.

      I just want to stress that the corruption wasn't caused by KahaDB, at least we haven't seen any cases. The corruptions were either caused by: disk filling up, or bad copies when migrating directories etc.

      Attachments

        1. AMQ-4339-fix2.patch
          24 kB
          Christian Posta
        2. AMQ-4339-fix.patch
          22 kB
          Christian Posta
        3. AMQ-4339-test-case.patch
          4 kB
          Christian Posta

        Activity

          People

            Unassigned Unassigned
            ken_barber Ken Barber
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: