Uploaded image for project: 'Qpid'
  1. Qpid
  2. QPID-6997

[Java Broker, BDBStore] HA JE Transaction commit might end up in NullPointerException when commit is invoked when majority is lost

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      On running some tests for BDB HA with default settings (ReplicaAckPolicy=SIMPLE_MAJORITY) involving publishing messages in parallel from multiple connections and disconnecting replica nodes in the middle of publishing transactions, in some rare circumstances the publishing transaction JE commit failed with NPE thrown from JE code as below:

      ########################################################################
      #
      # Unhandled Exception java.lang.NullPointerException in Thread IO-/127.0.0.1:35408
      #
      # Exiting
      #
      ########################################################################
      java.lang.NullPointerException
       at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:130)
       at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1130)
       at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:191)
       at com.sleepycat.je.txn.Txn.commit(Txn.java:756)
       at com.sleepycat.je.Transaction.doCommit(Transaction.java:609)
       at com.sleepycat.je.Transaction.commit(Transaction.java:410)
       at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.commitAsync(ReplicatedEnvironmentFacade.java:302)
       at org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.commitTranAsyncImpl(AbstractBDBMessageStore.java:794)
       at org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.access$1200(AbstractBDBMessageStore.java:74)
       at org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$BDBTransaction.commitTranAsync(AbstractBDBMessageStore.java:1364)
       at org.apache.qpid.server.txn.LocalTransaction.commitAsync(LocalTransaction.java:399)
       at org.apache.qpid.server.protocol.v0_8.AMQChannel.commit(AMQChannel.java:1220)
       at org.apache.qpid.server.protocol.v0_8.AMQChannel.receiveTxCommit(AMQChannel.java:3622)
       at org.apache.qpid.codec.ServerDecoder.processMethod(ServerDecoder.java:228)
       at org.apache.qpid.codec.AMQDecoder.processFrame(AMQDecoder.java:191)
       at org.apache.qpid.server.protocol.v0_8.BrokerDecoder.doProcessFrame(BrokerDecoder.java:114)
       at org.apache.qpid.server.protocol.v0_8.BrokerDecoder.access$000(BrokerDecoder.java:36)
       at org.apache.qpid.server.protocol.v0_8.BrokerDecoder$1.run(BrokerDecoder.java:78)
       at org.apache.qpid.server.protocol.v0_8.BrokerDecoder$1.run(BrokerDecoder.java:74)
       at java.security.AccessController.doPrivileged(Native Method)
       at org.apache.qpid.server.protocol.v0_8.BrokerDecoder.processFrame(BrokerDecoder.java:73)
       at org.apache.qpid.codec.AMQDecoder.processInput(AMQDecoder.java:173)
       at org.apache.qpid.codec.AMQDecoder.decode(AMQDecoder.java:114)
       at org.apache.qpid.codec.ServerDecoder.decodeBuffer(ServerDecoder.java:43)
       at org.apache.qpid.server.protocol.v0_8.AMQPConnection_0_8$1.run(AMQPConnection_0_8.java:266)
       at org.apache.qpid.server.protocol.v0_8.AMQPConnection_0_8$1.run(AMQPConnection_0_8.java:258)
       at java.security.AccessController.doPrivileged(Native Method)
       at org.apache.qpid.server.protocol.v0_8.AMQPConnection_0_8.received(AMQPConnection_0_8.java:257)
       at org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:142)
       at org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:547)
       at org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58)
       at org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:446)
       at org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:253)
       at org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:108)
       at org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:499)
       at org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:337)
       at org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:86)
       at org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:457)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       at java.lang.Thread.run(Thread.java:745)
      

      It seems that majority was lost in the middle of transaction. Qpid HA logic detected that and restarted the environment, however, the transaction was not aborted.

      On closing of the environment the following has been reported:

      Problem: 3 locks left
      ---- LSN: 0x0/0x2d80----
       ThinLockAddr:1794979774 Owner:1919034342 -50_Selector-virtualhost-test-iopool_MasterTxn Waiters: (none)
      ---- LSN: 0x0/0x1ed8----
       ThinLockAddr:374788562 Owner:1919034342 -50_Selector-virtualhost-test-iopool_MasterTxn Waiters: (none)
      ---- LSN: 0x0/0x20db----
       ThinLockAddr:501824883 Owner:1919034342 -50_Selector-virtualhost-test-iopool_MasterTxn Waiters: (none)
      [BROKER-1] o.a.q.t.u.SpawnedBrokerHolder
      Problem: 1 txns left
      txnId = -50 txnName = 1919034342 -50_Selector-virtualhost-test-iopool_MasterTxn
      Local Cache Usage = 240
      Cache Layout: Allocation of resources in the cache.
       adminBytes=0
       cacheTotalBytes=1,622,051
       dataAdminBytes=0
       dataBytes=0
       lockBytes=240
       sharedCacheTotalBytes=0
      WARN  [Environment-test:nodetestTransferMasterWhilstTransactionsInFlight10001] o.a.q.s.s.b.r.ReplicatedEnvironmentFacade Ignoring an exception whilst closing environment
      com.sleepycat.je.EnvironmentFailureException: (JE 5.0.104) nodetestTransferMasterWhilstTransactionsInFlight10001(1):/tmp/qpid-work-org.apache.qpid.server.store.berkeleydb.replication.MultiNodeTest.testTransferMasterWhilstTransactionsInFlight-1-8967653656689695718/test/config Problem closing handle nodetestTransferMasterWhilstTransactionsInFlight10001(1) UNEXPECTED_EXCEPTION: Unexpected internal Exception, may have side effects. Environment is invalid and must be closed.
              at com.sleepycat.je.rep.ReplicatedEnvironment.close(ReplicatedEnvironment.java:709) ~[je-5.0.104.jar:5.0.104]
              at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.closeEnvironmentOnRestart(ReplicatedEnvironmentFacade.java:1277) [qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
              at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.restartEnvironment(ReplicatedEnvironmentFacade.java:1253) [qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
              at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.access$500(ReplicatedEnvironmentFacade.java:92) [qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
              at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$3.run(ReplicatedEnvironmentFacade.java:481) [qpid-bdbstore-6.1.0-SNAPSHOT.jar:6.1.0-SNAPSHOT]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_80]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80]
              at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
      Caused by: java.lang.IllegalStateException: Unclosed Database: MESSAGE_CONTENT
      Unclosed Database: MESSAGE_CONTENT
      Databases left open: 2
      There is 1 existing transaction opened against the Environment.
      Aborting open transactions ...
      aborting <Transaction id="-46">
              at com.sleepycat.je.Environment.close(Environment.java:390) ~[je-5.0.104.jar:5.0.104]
              at com.sleepycat.je.rep.ReplicatedEnvironment.close(ReplicatedEnvironment.java:702) ~[je-5.0.104.jar:5.0.104]
              ... 7 common frames omitted
      

      It appears we need to abort all transactions whilst restarting the environment

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            orudyy Alex Rudyy
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment