Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
0.32, qpid-java-6.0.6, qpid-java-6.1.1, qpid-java-broker-7.0.0
-
None
Description
This is the scenario:
BDB message store.
AsyncRecovery completes recovery of queue Q1. Queue Q2 is still being recovered.
An transacted application sends a message to Q1 but does not commit the transaction. This ends up creating an entry in the DeliveryDb through the following call chain:
> AMQChannel#deliverCurrentMessageIfComplete
--> AbstractExchange#send
----> LocalTransaction#enqueue(List<? extends BaseQueue>, EnqueueableMessage, ServerTransaction.Action)
------> AbstractBDBMessageStore.BDBTransaction#enqueueMessage
--------> AbstractBDBMessageStore#enqueueMessage
----------> getDeliveryDb().put
The transaction will hold a write lock on this new DB entry until the transaction is committed.
If it now happens that this entry appears immediately after the last entry of Q2 in the database the recovery code will try to read that entry in AbstractBDBMessageStore#visitMessageInstances(TransactionLogResource, MessageInstanceHandler):
cursor.getNext(key, value, LockMode.DEFAULT)
Acquisition of the read lock will timeout after je.lock.timeout (default: 500ms) and after exhaustion of the retries (hard coded to 5) a StoreException will be thrown bringing down the broker with the following stack trace (from a broker derived from 0.32):
######################################################################## # # Unhandled Exception com.google.common.util.concurrent.UncheckedExecutionException: org.apache.qpid.server.store.StoreException: Cannot visit messages in Thread pool-2-thread-1 # # Exiting # ######################################################################## com.google.common.util.concurrent.UncheckedExecutionException: org.apache.qpid.server.store.StoreException: Cannot visit messages at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1358) at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1344) at org.apache.qpid.server.virtualhost.AbstractVirtualHost$4.run(AbstractVirtualHost.java:1619) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.qpid.server.store.StoreException: Cannot visit messages at org.apache.qpid.server.store.berkeleydb.StandardEnvironmentFacade.handleDatabaseException(StandardEnvironmentFacade.java:369) at org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.sleepOrThrowOnLockConflict(AbstractBDBMessageStore.java:665) at org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.visitMessageInstances(AbstractBDBMessageStore.java:233) at org.apache.qpid.server.virtualhost.AsynchronousMessageStoreRecoverer$AsynchronousRecoverer.recoverQueue(AsynchronousMessageStoreRecoverer.java:163) at org.apache.qpid.server.virtualhost.AsynchronousMessageStoreRecoverer$AsynchronousRecoverer.access$900(AsynchronousMessageStoreRecoverer.java:86) at org.apache.qpid.server.virtualhost.AsynchronousMessageStoreRecoverer$AsynchronousRecoverer$QueueRecoveringTask.run(AsynchronousMessageStoreRecoverer.java:436) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) ... 3 more Caused by: com.sleepycat.je.LockTimeoutException: (JE 5.0.104) Lock expired. Locker 2109822146 -1_Queue Recoverer : Q2 (vh: default)_ThreadLocker: waited for lock on database=QUEUE_ENTRIES LockAddr:530828558 LSN=0x0/0xa45c type=READ grant=WAIT_NEW timeoutMillis=500 startTime=1488294679798 endTime=1488294680298 Owners: [<LockInfo locker="879351915 243_IoRcvr-/127.0.0.1:5672-/127.0.0.1:51619_Txn" type="WRITE"/>] Waiters: [] at com.sleepycat.je.txn.LockManager.newLockTimeoutException(LockManager.java:665) at com.sleepycat.je.txn.LockManager.makeTimeoutMsgInternal(LockManager.java:623) at com.sleepycat.je.txn.SyncedLockManager.makeTimeoutMsg(SyncedLockManager.java:97) at com.sleepycat.je.txn.LockManager.lockInternal(LockManager.java:390) at com.sleepycat.je.txn.LockManager.lock(LockManager.java:276) at com.sleepycat.je.txn.BasicLocker.lockInternal(BasicLocker.java:118) at com.sleepycat.je.txn.Locker.lock(Locker.java:443) at com.sleepycat.je.dbi.CursorImpl.lockLN(CursorImpl.java:2642) at com.sleepycat.je.dbi.CursorImpl.lockLN(CursorImpl.java:2443) at com.sleepycat.je.dbi.CursorImpl.fetchCurrent(CursorImpl.java:2272) at com.sleepycat.je.dbi.CursorImpl.getCurrentAlreadyLatched(CursorImpl.java:1480) at com.sleepycat.je.dbi.CursorImpl.getNext(CursorImpl.java:1607) at com.sleepycat.je.Cursor.searchInternal(Cursor.java:2898) at com.sleepycat.je.Cursor.searchAllowPhantoms(Cursor.java:2732) at com.sleepycat.je.Cursor.searchNoDups(Cursor.java:2586) at com.sleepycat.je.Cursor.search(Cursor.java:2553) at com.sleepycat.je.Cursor.getSearchKeyRange(Cursor.java:1757) at org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.visitMessageInstances(AbstractBDBMessageStore.java:203) ... 8 more
I believe the call to cursor.getSearchKeyRange(key,value, LockMode.DEFAULT) has the same problem just under different circumstances.
I have not checked whether other similar defects exist.