Uploaded image for project: 'ActiveMQ'
  1. ActiveMQ
  2. AMQ-4837

LevelDB corrupted when in a replication cluster

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 5.9.0
    • Fix Version/s: 5.9.1, 5.10.0
    • Component/s: LevelDB
    • Labels:
      None
    • Environment:

      CentOS, Linux version 2.6.32-71.29.1.el6.x86_64
      java-1.7.0-openjdk.x86_64/java-1.6.0-openjdk.x86_64
      zookeeper-3.4.5.2

      Description

      I have clustered 3 ActiveMQ instances using replicated leveldb and zookeeper. When performing some tests using Web UI, I can across issues that appears to corrupt the leveldb data files.

      The issue can be replicated by performing the following steps:
      1. Start 3 activemq nodes.
      2. Push a message to the master (Node1) and browse the queue using the web UI
      3. Stop master node (Node1)
      4. Push a message to the new master (Node2) and browse the queue using the web UI. Message summary and queue content ok.
      5. Start Node1
      6. Stop master node (Node2)
      7. Browse the queue using the web UI on new master (Node3). Message summary ok however when clicking on the queue, no message details. An error (see below) is logged by the master, which attempts a restart.

      From this point, the database appears to be corrupted and the same error occurs to each node infinitely (shutdown/restart). The only way around is to stop the nodes and clear the data files.

      However when a message is pushed between step 5 and 6, the error doesn’t occur.

      =================================
      Leveldb configuration on the 3 instances:
      <persistenceAdapter>
      <replicatedLevelDB
      directory="${activemq.data}/leveldb"
      replicas="3"
      bind="tcp://0.0.0.0:0"
      zkAddress="zkserver:2181"
      zkPath="/activemq/leveldb-stores"
      />
      </persistenceAdapter>

      =================================
      The error is:
      INFO | Stopping BrokerService[localhost] due to exception, java.io.IOException
      java.io.IOException
      at org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:39)
      at org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBClient.scala:543)
      at org.apache.activemq.leveldb.LevelDBClient.might_fail_using_index(LevelDBClient.scala:974)
      at org.apache.activemq.leveldb.LevelDBClient.collectionCursor(LevelDBClient.scala:1270)
      at org.apache.activemq.leveldb.LevelDBClient.queueCursor(LevelDBClient.scala:1194)
      at org.apache.activemq.leveldb.DBManager.cursorMessages(DBManager.scala:708)
      at org.apache.activemq.leveldb.LevelDBStore$LevelDBMessageStore.recoverNextMessages(LevelDBStore.scala:741)
      at org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:106)
      at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:258)
      at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:108)
      at org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(StoreQueueCursor.java:157)
      at org.apache.activemq.broker.region.Queue.doPageInForDispatch(Queue.java:1875)
      at org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:2086)
      at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1581)
      at org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:129)
      at org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:47)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:722)
      Caused by: java.lang.NullPointerException
      at org.apache.activemq.leveldb.LevelDBClient$$anonfun$queueCursor$1.apply(LevelDBClient.scala:1198)
      at org.apache.activemq.leveldb.LevelDBClient$$anonfun$queueCursor$1.apply(LevelDBClient.scala:1194)
      at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1$$anonfun$apply$mcV$sp$12.apply(LevelDBClient.scala:1272)
      at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1$$anonfun$apply$mcV$sp$12.apply(LevelDBClient.scala:1271)
      at org.apache.activemq.leveldb.LevelDBClient$RichDB.check$4(LevelDBClient.scala:315)
      at org.apache.activemq.leveldb.LevelDBClient$RichDB.cursorRange(LevelDBClient.scala:317)
      at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply$mcV$sp(LevelDBClient.scala:1271)
      at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply(LevelDBClient.scala:1271)
      at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply(LevelDBClient.scala:1271)
      at org.apache.activemq.leveldb.LevelDBClient.usingIndex(LevelDBClient.scala:968)
      at org.apache.activemq.leveldb.LevelDBClient$$anonfun$might_fail_using_index$1.apply(LevelDBClient.scala:974)
      at org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBClient.scala:540)
      ... 17 more

        Attachments

        1. LevelDBCorrupted.zip
          18 kB
          Guillaume
        2. activemq.xml
          6 kB
          Tenzin giatso

          Issue Links

            Activity

              People

              • Assignee:
                chirino Hiram R. Chirino
                Reporter:
                Gnome Guillaume
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: