Uploaded image for project: 'ActiveMQ Artemis'
  1. ActiveMQ Artemis
  2. ARTEMIS-4794

CoreBridge: Duplicate message when bridge is stopped/Lost message when bridge is paused while messages being produced to target node.

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.30.0, 2.34.0, 2.35.0
    • 2.36.0
    • None
    • None

    Description

      Attached test BridgeDuplicateMessagesARTEMIS4794Test.java highlights the issue with org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl

      Place it under tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/cluster/bridge

      Summary:
           When a bridge is stopped while messages being produced to the target node, it can lead to duplicate messages.

      Description:
          When Using bridge and programmatically stopping it while messages are being produced to the target node, the source node fails to get the acknowledgement from target node and messages now exists on the source and the target node.

      It appears that the "active" flag being set to false when BridgeImpl.StopRunnable is called prevent message to be acknowledged by BridgeImpl::sendAcknowledged function

       

      Context:
      This bug appear in my code (a custom plugin) because is start and stop Bridge programmatically to move messages from one node to another when some conditions are met, if they are no longer met I want to stop the moving of messages.
       

      Notes:

      • Changing bridge configuration useDuplicateDetection,confirmationWindowSize or producerWindowSize parameter do not help to mitigate the issue
      • Not related to large messages, i use large messages in my test to ease reproduction 
      • Reproduced on 2.30 and 2.34
      • Calling pause() does not create duplicate server.getClusterManager().getBridges().get(bridgeName).pause();

       

       

       

      UPDATE: When using pause instead of stop in above scenari, I get message not being develirable anymore
      Summary:
      When a bridge is paused while large messages being produced to the target node, it can lead to message not able to be delivered to new consumers.
      Description:
      When Using bridge and programmatically pausing it while messages are being produced to the target node, If large messages are being delivered, the thread In BridgeImpl::deliverLargeMessage is not awaited, and the bridge is paused then the Runnable of deliverLargeMessage is being run, leading to a situation were the message won't be delivered to new consumers

      Notes:

      • PauseRunnable does not await for task in executor to complete, deliverLargeMessage do create task in executor
        • We can see that even after PauseRunnable has complete, deliverLargeMessage's task is running after.
      • If I call bridge1.onCreditsFlow(true, null); to set the flag blockedOnFlowControl to true, before calling pause, it prevent putting new task on executor and mitigate the issue, but It feels weird and I think there might still be race condition

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jbertram Justin Bertram
            nmeylan nmeylan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 50m
                50m

                Slack

                  Issue deployment