Uploaded image for project: 'ActiveMQ Artemis'
  1. ActiveMQ Artemis
  2. ARTEMIS-5027

Bug Report: Memory Leak in Artemis MQ when spokes disconnect

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.37.0
    • None
    • Broker
    • None
    • Oracle Linux Server release 9.4, 4CPU cores, 16GB of RAM (max JVM 10GB)

    Description

      Environment Details:

      • Setup:
      • Artemis Broker: Version 2.37

      Issue Description: The setup is a hub-spokes layout with one central Artemis (hub) and many Artemis brokers connecting to it (spokes). The brokers are connected using core bridges between queues on the spokes and queues on the hub. There are 10 core bridges from spoke-to-hub and 10 core bridges from hub-to-spoke, totalling in 20 connections per spoke. There are 200 spokes in this test.

      When an Artemis spoke broker (the Artemis broker making connections to the monitored Artemis broker) is either forcibly terminated (killed) or gracefully stopped and then started again, we observe a significant increase in memory usage within the hub Artemis broker. The memory consumption increases by approximately 200MB per restarted spoke broker. This indicates a resource/memory leak.

      Fault scenario: After the spoke broker is restarted, the memory allocated by the hub Artemis broker continues to grow without being released. This increase in memory usage persists, potentially leading to memory exhaustion over time, which could destabilize the entire system. The heap dump suggests that the resource leak happens around the connections initiated from hub-to-spoke direction, but this needs proving.

      Technical Details:

      • Observations:
      • A heap memory dump was taken and analyzed.
      • The issue appears to originate from the org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl class within the Artemis broker codebase.
      • This class seems to fail to release resources properly when the client broker is terminated, likely due to unreleased connections or buffers.

      Affected version:

      • The issue is present in Artemis 2.37 version

      Steps to Reproduce:

      1. Start Artemis spoke brokers and a hub Artemis broker using the specified versions.
      2. Wait for them to establish all the core bridge connections.
      3. Forcefully terminate (kill) or gracefully stop the Artemis spoke broker.
      4. Start the spoke broker again and see it re-establish the connections.
      5. Monitor the memory usage of the hub Artemis broker over time.
      6. Observe the continuous increase in memory usage

      Additional Information:

      We have created a memory dump from such a hub broker with around 450 spokes after exhausting about 5GB of heap.

      • Memory Dump Report:
      • 144,733 instances of org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl, loaded by java.net.URLClassLoader @ 0x6c81acd70, occupy 4,535,785,712 (85.38%) bytes.
      • Most of these instances are referenced from one instance of java.util.HashMap$Node[], loaded by <system class loader>, which occupies 141,584 (0.00%) bytes. This instance is referenced by org.apache.activemq.artemis.core.server.cluster.ClusterManager @ 0x6c1ed4b60, loaded by java.net.URLClassLoader @ 0x6c81acd70.
      • The thread org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl$FailureCheckAndFlushThread @ 0x6c2c1c340 activemq-failure-check-thread has a local variable or reference to org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl @ 0x6c2c1c910, which is on the shortest path to java.util.HashMap$Node[8192] @ 0x710f30780.
      • The thread org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl$FailureCheckAndFlushThread @ 0x6c2c1c340 activemq-failure-check-thread keeps local variables with a total size of 960 (0.00%) bytes.
      • The stack trace of this thread is available and includes details of involved local variables.

      Heap dump usage:
      The increase in heap memory is marked by rectangles in the attached pictures.

      Attachments

        1. G1oldGen.png
          43 kB
          Dragan Jankovic
        2. Heamspace.png
          50 kB
          Dragan Jankovic
        3. JConsole.png
          53 kB
          Dragan Jankovic

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jbertram Justin Bertram
            dragan.j.flipside Dragan Jankovic
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment