Uploaded image for project: 'ActiveMQ Artemis'
  1. ActiveMQ Artemis
  2. ARTEMIS-5140

Poisonous message in $.artemis.internal queue causes high resource usage on target redistribution node in cluster

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 2.36.0
    • None
    • Broker, Clustering
    • None

    Description

      Configuration:
      A cluster of three nodes A,B,C with message redistribution enabled.

      Description: 
      When the cluster connectivity is started, each Artemis node creates a $.artemis.internal queue for each other nodes in the cluster.
      Message pending redistribution are moved in these queues by Artemis.

      On node C, if a poisonous (non-forwardable) message is added or moved to a "$.artemis.internal" queue, it leads to:

      • the cluster connection bridge attempts to process the message
      • the bridge fails at the beforeForward step, as message lacks essential properties for the message redistribution (no queue IDs), resulting in an exception
      • cluster connection and consumers are immediately closed
      • one second later, the cluster connection and consumers are re-created, which triggers the creation of a "notif.*" queue on node B

      This sequence happens in loop and causes continuous high CPU and disk usage on node B, as the "activemq.notification" address keeps accumulating messages in "notif.*" queues.

      A potential protection mechanism could be implemented to move poisonous messages back to their original queue (if identifiable in message properties)
      Or, if this is not possible, the invalid message could be moved to a dead-letter queue.

      Note:
      Originally, the problem was initially seen when an operator moved a message stuck in a "duplicated" internal queue into the standard internal queue to start its redistribution.

      Screenshots and related logs are provided in attachment.

       

      Reproduction:
      To reproduce, simply put or move a message into a $.artemis.internal queue.
      This triggers the reconnection loop almost instantly on the node where the message was injected.
      Resource usage on the nodeId targeted by the $.artemis.internal queue rapidly increase as more and more "notif.*" queues are being created.

       

      Attachments

        1. message-redistribution-failing-in-loop.log
          10 kB
          Jean-Pascal Briquet
        2. messages-accumulated-in-notif-queues.png
          503 kB
          Jean-Pascal Briquet
        3. notif-queue-created-in-loop.log
          8 kB
          Jean-Pascal Briquet
        4. notif-queues-growing.png
          263 kB
          Jean-Pascal Briquet

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jpbriquet Jean-Pascal Briquet
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: