Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
qpid-java-broker-8.0.1
-
None
Description
There was deadlock (or [latch timeout|https://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/EnvironmentConfig.html#ENV_LATCH_TIMEOUT) in BDB database or latch was hold for more then timeout. Then was database environment closed, database transactions cancelled and broker initiated shutdown.
2020-08-19 00:42:41,955 ERROR [IO-/127.0.0.1:55550] (c.s.j.d.EnvironmentImpl) - default Thread dump follows for latch timeout: BIN2605191 currentThread: Thread[IO-/127.0.0.1:55550,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55522,5,main] 2020-08-19 00:42:41,955 ERROR [IO-/127.0.0.1:55580] (c.s.j.d.EnvironmentImpl) - default Thread dump follows for latch timeout: BIN2605060 currentThread: Thread[IO-/127.0.0.1:55580,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55584,5,main] 2020-08-19 00:42:41,955 ERROR [IO-/127.0.0.1:55544] (c.s.j.d.EnvironmentImpl) - default Thread dump follows for latch timeout: BIN2605128 currentThread: Thread[IO-/127.0.0.1:55544,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55548,5,main] .... 2020-08-19 00:42:42,174 ERROR [IO-/127.0.0.1:55538] (o.a.q.s.u.ServerScopedRuntimeException) - Error reading message metadata for message with id 52671357: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605191 currentThread: Thread[IO-/127.0.0.1:55538,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55522,5,main] UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. Environment is invalid and must be closed. org.apache.qpid.server.logging.logback.Logback1027WorkaroundTurboFilter$StringifiedException: com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605191 currentThread: Thread[IO-/127.0.0.1:55538,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55522,5,main] UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. Environment is invalid and must be closed. at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:443) at com.sleepycat.je.latch.LatchSupport.handleTimeout(LatchSupport.java:211) .... at org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496) at org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270) at org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134) at org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575) at org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366) at org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97) ... 5 more Suppressed: com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605176 currentThread: Thread[IO-/127.0.0.1:55496,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55494,5,main] UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. Environment is invalid and must be closed. at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:228) at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1766) at com.sleepycat.je.dbi.CursorImpl.checkEnv(CursorImpl.java:3869) at com.sleepycat.je.Cursor.checkEnv(Cursor.java:5394) at com.sleepycat.je.Cursor.close(Cursor.java:540) at com.sleepycat.je.Database.get(Database.java:1344) ... 41 more [CIRCULAR REFERENCE:com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605176 currentThread: Thread[IO-/127.0.0.1:55496,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55494,5,main] UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. Environment is invalid and must be closed.] End of stringified Stacktrace ... 2020-08-19 00:42:42,201 ERROR [IO-/127.0.0.1:55498] (o.a.q.s.Main) - Uncaught exception, shutting down. org.apache.qpid.server.logging.logback.Logback1027WorkaroundTurboFilter$StringifiedException: org.apache.qpid.server.util.ServerScopedRuntimeException: org.apache.qpid.server.store.StoreException: Error reading message metadata for message with id 52671021: (JE 7.4.5) /var/lib/qpidd/cludisk/8080/messages Latch timeout. BIN2605177 currentThread: Thread[IO-/127.0.0.1:55498,5,main] currentTime: 1597790561955 exclusiveOwner: Thread[IO-/127.0.0.1:55554,5,main] UNEXPECTED_STATE_FATAL: Unexpected internal state, unable to continue. Environment is invalid and must be closed. at org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:582) at java.security.AccessController.doPrivileged(Native Method)
BDB thread dump attached.
Altogether there were 16 threads handling BDB latch timeout (in stacktrace there was a call of com.sleepycat.je.latch.LatchSupport.handleTimeout() method).
2 of them – when sending messages:
java.lang.Thread.dumpThreads(Native Method) java.lang.Thread.getAllStackTraces(Thread.java:1610) com.sleepycat.je.utilint.LoggerUtils.fullThreadDump(LoggerUtils.java:566) com.sleepycat.je.latch.LatchSupport.handleTimeout(LatchSupport.java:209) com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:63) com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107) com.sleepycat.je.tree.IN.latchShared(IN.java:560) com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372) com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1817) com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1748) com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1666) com.sleepycat.je.tree.Tree.findBinForInsert(Tree.java:1637) com.sleepycat.je.dbi.CursorImpl.insertRecordInternal(CursorImpl.java:1337) com.sleepycat.je.dbi.CursorImpl.insertOrUpdateRecord(CursorImpl.java:1221) com.sleepycat.je.Cursor.putNoNotify(Cursor.java:2962) com.sleepycat.je.Cursor.putNotify(Cursor.java:2800) com.sleepycat.je.Cursor.putNoDups(Cursor.java:2647) com.sleepycat.je.Cursor.putInternal(Cursor.java:2478) com.sleepycat.je.Cursor.putInternal(Cursor.java:830) com.sleepycat.je.Database.put(Database.java:1574) com.sleepycat.je.Database.put(Database.java:1627) org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.storeMetaData(AbstractBDBMessageStore.java:591) org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.access$600(AbstractBDBMessageStore.java:77) org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$StoredBDBMessage.store(AbstractBDBMessageStore.java:1165) org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction$1.run(AbstractBDBMessageStore.java:1344) org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction.doPreCommitActions(AbstractBDBMessageStore.java:1378) org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction.commitTranAsync(AbstractBDBMessageStore.java:1400) org.apache.qpid.server.txn.AsyncAutoCommitTransaction.enqueue(AsyncAutoCommitTransaction.java:294) org.apache.qpid.server.message.RoutingResult.send(RoutingResult.java:124) org.apache.qpid.server.protocol.v1_0.NodeReceivingDestination.send(NodeReceivingDestination.java:117) org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveDelivery(StandardReceivingLinkEndpoint.java:280) org.apache.qpid.server.protocol.v1_0.AbstractReceivingLinkEndpoint.receiveTransfer(AbstractReceivingLinkEndpoint.java:165) org.apache.qpid.server.protocol.v1_0.Session_1_0.receiveTransfer(Session_1_0.java:637) org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receiveTransfer(AMQPConnection_1_0Impl.java:792) org.apache.qpid.server.protocol.v1_0.type.transport.Transfer.invoke(Transfer.java:295) org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.received(AMQPConnection_1_0Impl.java:515) org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receive$0(AMQPConnection_1_0Impl.java:467) org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$146/229440539.run(Unknown Source) java.security.AccessController.doPrivileged(Native Method) org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receive(AMQPConnection_1_0Impl.java:461) org.apache.qpid.server.protocol.v1_0.framing.FrameHandler.parse(FrameHandler.java:224) org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1312) org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576) org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$74/2141950108.run(Unknown Source) java.security.AccessController.doPrivileged(Native Method) org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571) org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135) org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611) org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58) org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496) org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270) org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134) org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575) org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366) org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97) org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464) org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1136768342.run(Unknown Source) java.lang.Thread.run(Thread.java:748)
14 of them – when receiving messages:
com.sleepycat.je.utilint.LoggerUtils.fullThreadDump(LoggerUtils.java:557) com.sleepycat.je.latch.LatchSupport.handleTimeout(LatchSupport.java:209) com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:63) com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107) com.sleepycat.je.tree.IN.latchShared(IN.java:560) com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372) com.sleepycat.je.tree.Tree.search(Tree.java:2275) com.sleepycat.je.tree.Tree.search(Tree.java:2172) com.sleepycat.je.dbi.CursorImpl.searchExact(CursorImpl.java:1951) com.sleepycat.je.Cursor.searchExact(Cursor.java:4195) com.sleepycat.je.Cursor.searchNoDups(Cursor.java:4056) com.sleepycat.je.Cursor.search(Cursor.java:3858) com.sleepycat.je.Cursor.getInternal(Cursor.java:1282) com.sleepycat.je.Database.get(Database.java:1342) com.sleepycat.je.Database.get(Database.java:1401) org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.getMessageMetaData(AbstractBDBMessageStore.java:299) org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$StoredBDBMessage.getMetaData(AbstractBDBMessageStore.java:1077) org.apache.qpid.server.message.AbstractServerMessageImpl.isPersistent(AbstractServerMessageImpl.java:178) org.apache.qpid.server.queue.QueueEntryImpl.populateInstanceProperties(QueueEntryImpl.java:155) org.apache.qpid.server.queue.QueueEntryImpl.<init>(QueueEntryImpl.java:147) org.apache.qpid.server.queue.OrderedQueueEntry.<init>(OrderedQueueEntry.java:46) org.apache.qpid.server.queue.StandardQueueEntry.<init>(StandardQueueEntry.java:37) org.apache.qpid.server.queue.StandardQueueEntryList.createQueueEntry(StandardQueueEntryList.java:48) org.apache.qpid.server.queue.StandardQueueEntryList.createQueueEntry(StandardQueueEntryList.java:26) org.apache.qpid.server.queue.OrderedQueueEntryList.add(OrderedQueueEntryList.java:92) org.apache.qpid.server.queue.AbstractQueue.doEnqueue(AbstractQueue.java:1356) org.apache.qpid.server.queue.AbstractQueue.enqueue(AbstractQueue.java:1289) org.apache.qpid.server.message.RoutingResult$1.postCommit(RoutingResult.java:135) org.apache.qpid.server.txn.AsyncAutoCommitTransaction$3.postCommit(AsyncAutoCommitTransaction.java:309) org.apache.qpid.server.txn.AsyncCommand.complete(AsyncCommand.java:84) org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveComplete(StandardReceivingLinkEndpoint.java:588) org.apache.qpid.server.protocol.v1_0.Session_1_0.lambda$receivedComplete$5(Session_1_0.java:1358) org.apache.qpid.server.protocol.v1_0.Session_1_0$$Lambda$147/1374598514.accept(Unknown Source) java.lang.Iterable.forEach(Iterable.java:75) org.apache.qpid.server.protocol.v1_0.Session_1_0.receivedComplete(Session_1_0.java:1358) org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receivedComplete$11(AMQPConnection_1_0Impl.java:1347) org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$152/950450513.run(Unknown Source) java.security.AccessController.doPrivileged(Native Method) org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receivedComplete(AMQPConnection_1_0Impl.java:1345) org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1326) org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576) org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$74/2141950108.run(Unknown Source) java.security.AccessController.doPrivileged(Native Method) org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571) org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135) org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611) org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58) org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496) org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270) org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134) org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575) org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366) org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97) org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464) org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1136768342.run(Unknown Source)
Before the thread dump in log were present massive “QUE-1005 : Dropped” messages (ca. 73.000) and many error messages “Connection closed with error : amqp:resource-limit-exceeded - local-idle-timeout expired” (191).
I’m not sure, that AbstractServerMessageImpl.isPersistent() call can cause a deadlock, because get() operation is invoked with LockMode.READ_UNCOMMITTED parameter. According to BDB documentation READ_UNCOMMITTED mode should be non-blocking.