ActiveMQ
  1. ActiveMQ
  2. AMQ-3021

HttpTunnelServlet leaks BlockingQueueTransport objects, causing eventual OOM on heap space

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 5.4.1
    • Fix Version/s: 5.4.2
    • Component/s: Broker, Transport
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      Symptom
      ========
      We have a production system involving a network of 8 Brokers connected over HTTP. The Brokers discover each other using SimpleDiscoveryAgent. Our network experienced a period of instability during which time numerous Broker-to-Broker bridges were created and failed repeatedly. Over the course of about 7 hours, two of the Brokers crashed with OOM heap space errors.

      We analyzed the heap dump and discovered several thousand instances of org.apache.activemq.transport.http.BlockingQueueTransport. These transports were associated with bridges that had failed, however, they were not being garbage collected because HttpTunnelServlet was maintaining references to them.

      This issue was easily replicated in a test environment were we repeatedly broke the connection between a pair of Brokers connected over HTTP. In each case, both Brokers maintained indefinitely a number of instances of BlockingQueueTransport equal to the number of times the network was interrupted.

      Cause
      =====
      When a bridge is first created over HTTP, the client broker's HttpClientTransport sends a HEAD command to the server broker, which is processed by an instance of HttpTunnelServlet. In response,e HttpTunnelServlet creates an instance of BlockingQueueTransport to represent the connection to the client broker. This instance of BlockingQueueTransport is stored in a private hash map managed by HttpTunnelServlet and indexed by the client's unique ID:

      public class HttpTunnelServlet extends HttpServlet {
      ...
      private final Map<String, BlockingQueueTransport> clients = new HashMap<String, BlockingQueueTransport>();
      ...

      protected BlockingQueueTransport createTransportChannel(HttpServletRequest request, HttpServletResponse response) throws IOException {
      String clientID = request.getHeader("clientID");
      ...
      answer = createTransportChannel();
      clients.put(clientID, answer);
      ...

      Every time a client broker reestablishes a bridge, it generates a new clientID. As a result, the clients hash map accumulates instances of BlockingQueueTransport, one for each bridge created. Nowhere in the implementation of HttpTunnelServlet is there any code that removes the instance when a client broker is no longer connected. In an environment with multiple brokers and an unreliable network, the client hash map can accumulate thousands of instances of BlockingQueueTransport.

      Solution
      =======
      HttpTunnelServlet needs to remove an instance of BlockingQueueTransport from the clients hash map whenever that instance is no longer being used. The addition of InactivityMonitor as a default interceptor for the BlockingQueueTransport (see AMQ-2764) is a partial solution in that it triggers the closure of unused BlockingQueueTransport instances; however, HttpTunnelServlet does not detect these closures.

      The solution is included a patch and involves the following changes to HttpTunnelServlet (not all changes are directly related to the OOM):
      1) The addition of a ServiceListener to the BlockingQueueTransport, which is triggered when the transport is closed and causes the removal of the transport from the clients hash map
      2) Refactoring of the access to the clients hash map to simplify thread safety (in particularly, removal of explicit synchronization in lieue of ConcurrentHashMap)
      3) An additional check on the BlockingQueueTransport to ensure that it was not prematurely closed (the previous code ignored this possibility)

      1. BlockingQueueTransport.java
        3 kB
        Stirling Chow
      2. BlockingQueueTransportLeakTest.java
        6 kB
        Stirling Chow
      3. patch.txt
        7 kB
        Stirling Chow

        Activity

        Hide
        Stirling Chow added a comment -

        Patch for HttpTunnelServlet.java

        Show
        Stirling Chow added a comment - Patch for HttpTunnelServlet.java
        Hide
        Stirling Chow added a comment -

        A unit test (requires slightly modified BlockingQueueTransport) that demonstrates the leak. The unit test fails with unpatched HttpTunnelServlet.

        Show
        Stirling Chow added a comment - A unit test (requires slightly modified BlockingQueueTransport) that demonstrates the leak. The unit test fails with unpatched HttpTunnelServlet.
        Hide
        Dejan Bosanac added a comment -

        Committed with svn revision 1032539. Thanks for the patch!

        Show
        Dejan Bosanac added a comment - Committed with svn revision 1032539. Thanks for the patch!
        Hide
        Stirling Chow added a comment - - edited

        Dejan, the addition of the CountdownLatch to BlockingQueueTransport should not be checked in — it was only there to demonstrate the issue through the unit test. In its present state, a NullPointerException is likely to occur at runtime because the finalize() method will be triggered, but finalizeLatch will not have been initialized.

        If you want to include the unit test, the existing code will need to be changed to use a different technique for determining the leaking BlockingQueueTransport. Somehow you'll need access to the instance of HttpTunnelServlet, publically expose the clients hashmap, and then assert against the number of elements in the hashmap.

        Unfortunately, the HttpTunnelServlet/BlockingQueueTransport class structure was not set up to support this kind of unit test, so unless you know a way to determine the number of instances of a class that are loaded in the heap, you'll have to add some testing specific code, which seems rather invasive.

        ======

        To summarize, the patch that I submitted should be applied, and it should only affect HttpTunnelServlet. BlockingQueueTransport should be left as-is and the unit test should not be checked in.

        Show
        Stirling Chow added a comment - - edited Dejan, the addition of the CountdownLatch to BlockingQueueTransport should not be checked in — it was only there to demonstrate the issue through the unit test. In its present state, a NullPointerException is likely to occur at runtime because the finalize() method will be triggered, but finalizeLatch will not have been initialized. If you want to include the unit test, the existing code will need to be changed to use a different technique for determining the leaking BlockingQueueTransport. Somehow you'll need access to the instance of HttpTunnelServlet, publically expose the clients hashmap, and then assert against the number of elements in the hashmap. Unfortunately, the HttpTunnelServlet/BlockingQueueTransport class structure was not set up to support this kind of unit test, so unless you know a way to determine the number of instances of a class that are loaded in the heap, you'll have to add some testing specific code, which seems rather invasive. ====== To summarize, the patch that I submitted should be applied, and it should only affect HttpTunnelServlet. BlockingQueueTransport should be left as-is and the unit test should not be checked in.
        Hide
        Dejan Bosanac added a comment -

        Removed unnecessary stuff. Thanks. Would be good to have a test to protect this in the future though!

        Show
        Dejan Bosanac added a comment - Removed unnecessary stuff. Thanks. Would be good to have a test to protect this in the future though!

          People

          • Assignee:
            Dejan Bosanac
            Reporter:
            Stirling Chow
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development