Uploaded image for project: 'Qpid'
  1. Qpid
  2. QPID-8276

[Broker-J] Broker can leak closed NonBlockingConnection objects and eventually run out of heap memory

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • qpid-java-broker-7.0.3, qpid-java-broker-7.0.2, qpid-java-broker-7.0.0, qpid-java-broker-7.0.1, qpid-java-6.1.7, qpid-java-broker-7.1.0, qpid-java-broker-7.0.4, qpid-java-broker-7.0.5, qpid-java-broker-7.0.6
    • Broker-J
    • None

    Description

      The Qpid Broker-J can leak closed NonBlockingConnection objects.

      The heap dump analysis of impacted broker instance revealed that leaked NonBlockingConnection objects are accumulated in SelectorThread.SelectionTask#_unscheduledConnections belonging to AMQP port IO pool. They have no ticker set and no state changed flag set (NonBlockingConnection#isStateChanged() == false). As result, the NonBlockingConnection objects are not removed from SelectorThread#_unscheduledConnections on invocation of SelectorThread.SelectionTask#processUnscheduledConnections() called from SelectorThread.SelectionTask#performSelect().

      The NonBlockingConnection and underlying model object are in closed state.
      It seems that leaked NonBlockingConnection was closed as part of invocation NonBlockingConnection#doWork(). The connection was unregistered on VirtualHost IO pool and re-registered with port IO pool as part of invocation NetworkConnectionScheduler#processConnection At first, it was stored in collection SelectorThread.SelectionTask#_unregisteredConnections. Later on, it was moved from SelectorThread.SelectionTask#_unregisteredConnections to SelectorThread.SelectionTask#_unscheduledConnections as part of invocation SelectorThread.SelectionTask#reregisterUnregisteredConnections and stack there afterwards.

      The TLS transport was used in leaked connection, but, I think that connection with plain transport can be leaked as well.

      I suspect that connections were leaked in result of following scenario:

      • Invocation of SocketChannel#read(java.nio.ByteBuffer[]) returned -1 in NonBlockingConnection#readFromNetwork.
      • The flag NonBlockingConnection#_closed was set to true. The method ProtocolEngine#notifyWork() was not invoked to set state changed flag to true
      • The execution of NonBlockingConnection#doWork() ended up it connection shutdown (due to NonBlockingConnection#_closed being set) and following re-scheduling the connection on port IO scheduler. The latter resulted in connection being put into SelectorThread.SelectionTask#_unscheduledConnections as described above.

      It seems that opening and closing frequent connections with connection life span >10s (required for tickers to be removed) can ended-up in connections being leaked as described in scenario above. It looks like connections which are closed orderly or closed in result of IOException being thrown from socket read/write operation are not effected by the defect.

      The impacted broker instance can eventually crash with out of memory error. Broker memory monitoring and periodic broker restarts can mitigate the issue.

      Attachments

        Activity

          People

            orudyy Alex Rudyy
            orudyy Alex Rudyy
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: