Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
2.36.0
-
None
-
None
Description
Configuration:
A cluster of three nodes A,B,C with message redistribution enabled.
Description:
When the cluster connectivity is started, each Artemis node creates a $.artemis.internal queue for each other nodes in the cluster.
Message pending redistribution are moved in these queues by Artemis.
On node C, if a poisonous (non-forwardable) message is added or moved to a "$.artemis.internal" queue, it leads to:
- the cluster connection bridge attempts to process the message
- the bridge fails at the beforeForward step, as message lacks essential properties for the message redistribution (no queue IDs), resulting in an exception
- cluster connection and consumers are immediately closed
- one second later, the cluster connection and consumers are re-created, which triggers the creation of a "notif.*" queue on node B
This sequence happens in loop and causes continuous high CPU and disk usage on node B, as the "activemq.notification" address keeps accumulating messages in "notif.*" queues.
A potential protection mechanism could be implemented to move poisonous messages back to their original queue (if identifiable in message properties)
Or, if this is not possible, the invalid message could be moved to a dead-letter queue.
Note:
Originally, the problem was initially seen when an operator moved a message stuck in a "duplicated" internal queue into the standard internal queue to start its redistribution.
Screenshots and related logs are provided in attachment.
Reproduction:
To reproduce, simply put or move a message into a $.artemis.internal queue.
This triggers the reconnection loop almost instantly on the node where the message was injected.
Resource usage on the nodeId targeted by the $.artemis.internal queue rapidly increase as more and more "notif.*" queues are being created.
Attachments
Attachments
Issue Links
- duplicates
-
ARTEMIS-4924 Proper handling of invalid messages in SNF queues
- Open