Details
Description
Environment Details:
- Setup:
- Artemis Broker: Version 2.37
Issue Description: The setup is a hub-spokes layout with one central Artemis (hub) and many Artemis brokers connecting to it (spokes). The brokers are connected using core bridges between queues on the spokes and queues on the hub. There are 10 core bridges from spoke-to-hub and 10 core bridges from hub-to-spoke, totalling in 20 connections per spoke. There are 200 spokes in this test.
When an Artemis spoke broker (the Artemis broker making connections to the monitored Artemis broker) is either forcibly terminated (killed) or gracefully stopped and then started again, we observe a significant increase in memory usage within the hub Artemis broker. The memory consumption increases by approximately 200MB per restarted spoke broker. This indicates a resource/memory leak.
Fault scenario: After the spoke broker is restarted, the memory allocated by the hub Artemis broker continues to grow without being released. This increase in memory usage persists, potentially leading to memory exhaustion over time, which could destabilize the entire system. The heap dump suggests that the resource leak happens around the connections initiated from hub-to-spoke direction, but this needs proving.
Technical Details:
- Observations:
- A heap memory dump was taken and analyzed.
- The issue appears to originate from the org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl class within the Artemis broker codebase.
- This class seems to fail to release resources properly when the client broker is terminated, likely due to unreleased connections or buffers.
Affected version:
- The issue is present in Artemis 2.37 version
Steps to Reproduce:
- Start Artemis spoke brokers and a hub Artemis broker using the specified versions.
- Wait for them to establish all the core bridge connections.
- Forcefully terminate (kill) or gracefully stop the Artemis spoke broker.
- Start the spoke broker again and see it re-establish the connections.
- Monitor the memory usage of the hub Artemis broker over time.
- Observe the continuous increase in memory usage
Additional Information:
We have created a memory dump from such a hub broker with around 450 spokes after exhausting about 5GB of heap.
- Memory Dump Report:
- 144,733 instances of org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl, loaded by java.net.URLClassLoader @ 0x6c81acd70, occupy 4,535,785,712 (85.38%) bytes.
- Most of these instances are referenced from one instance of java.util.HashMap$Node[], loaded by <system class loader>, which occupies 141,584 (0.00%) bytes. This instance is referenced by org.apache.activemq.artemis.core.server.cluster.ClusterManager @ 0x6c1ed4b60, loaded by java.net.URLClassLoader @ 0x6c81acd70.
- The thread org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl$FailureCheckAndFlushThread @ 0x6c2c1c340 activemq-failure-check-thread has a local variable or reference to org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl @ 0x6c2c1c910, which is on the shortest path to java.util.HashMap$Node[8192] @ 0x710f30780.
- The thread org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl$FailureCheckAndFlushThread @ 0x6c2c1c340 activemq-failure-check-thread keeps local variables with a total size of 960 (0.00%) bytes.
- The stack trace of this thread is available and includes details of involved local variables.
Heap dump usage:
The increase in heap memory is marked by rectangles in the attached pictures.
Attachments
Attachments
Issue Links
- duplicates
-
ARTEMIS-5017 Bridge leaks ClientSessionFactory instance on reconnect attempt
- Closed