[ARTEMIS-5140] Poisonous message in $.artemis.internal queue causes high resource usage on target redistribution node in cluster - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.36.0
Fix Version/s: None
Component/s: Broker, Clustering
Labels:
None

Description

Configuration:
A cluster of three nodes A,B,C with message redistribution enabled.

Description:
When the cluster connectivity is started, each Artemis node creates a $.artemis.internal queue for each other nodes in the cluster.
Message pending redistribution are moved in these queues by Artemis.

On node C, if a poisonous (non-forwardable) message is added or moved to a "$.artemis.internal" queue, it leads to:

the cluster connection bridge attempts to process the message
the bridge fails at the beforeForward step, as message lacks essential properties for the message redistribution (no queue IDs), resulting in an exception
cluster connection and consumers are immediately closed
one second later, the cluster connection and consumers are re-created, which triggers the creation of a "notif.*" queue on node B

This sequence happens in loop and causes continuous high CPU and disk usage on node B, as the "activemq.notification" address keeps accumulating messages in "notif.*" queues.

A potential protection mechanism could be implemented to move poisonous messages back to their original queue (if identifiable in message properties)
Or, if this is not possible, the invalid message could be moved to a dead-letter queue.

Note:
Originally, the problem was initially seen when an operator moved a message stuck in a "duplicated" internal queue into the standard internal queue to start its redistribution.

Screenshots and related logs are provided in attachment.

Reproduction:
To reproduce, simply put or move a message into a $.artemis.internal queue.
This triggers the reconnection loop almost instantly on the node where the message was injected.
Resource usage on the nodeId targeted by the $.artemis.internal queue rapidly increase as more and more "notif.*" queues are being created.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

message-redistribution-failing-in-loop.log
31/Oct/24 15:52
10 kB
Jean-Pascal Briquet
messages-accumulated-in-notif-queues.png
31/Oct/24 15:48
503 kB
Jean-Pascal Briquet
notif-queue-created-in-loop.log
31/Oct/24 15:50
8 kB
Jean-Pascal Briquet
notif-queues-growing.png
31/Oct/24 15:48
263 kB
Jean-Pascal Briquet

Issue Links

duplicates

ARTEMIS-4924 Proper handling of invalid messages in SNF queues

Open

Activity

People

Assignee:: Unassigned

Reporter:: Jean-Pascal Briquet

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 31/Oct/24 15:52

Updated:: 01/Nov/24 01:49

Resolved:: 31/Oct/24 16:09