Uploaded image for project: 'ActiveMQ Artemis'
  1. ActiveMQ Artemis
  2. ARTEMIS-1221

Duplicated ID causes LargeMessage lost at backup

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.5.5, 2.1.0
    • 2.2.0
    • Broker
    • None

    Description

      When a large message is replicated to backup, a pendingID is generated when the large message is finished. This pendingID is generated by a BatchingIDGenerator at backup.

      It is possible that a pendingID generated at backup may be a duplicate to an ID generated at live server.

      This can cause a problem when a large message with a messageID that is the same as another largemessage's pendingID is replicated and stored in the backup's journal, and then a deleteRecord for the pendingID is appended.

      If backup becomes live and loads the journal, it will drop the large message add record because there is a deleteRecord of the same ID (even though it is a pendingID of another message). As a result the expecting client will never get this large message.

      Attachments

        Issue Links

          Activity

            githubbot ASF GitHub Bot added a comment -

            GitHub user gaohoward opened a pull request:

            https://github.com/apache/activemq-artemis/pull/1347

            ARTEMIS-1221 Duplicated ID causes LargeMessage lost at backup

            When a large message is replicated to backup, a pendingID is generated
            when the large message is finished. This pendingID is generated by a
            BatchingIDGenerator at backup.

            It is possible that a pendingID generated at backup may be a duplicate
            to an ID generated at live server.

            This can cause a problem when a large message with a messageID that is
            the same as another largemessage's pendingID is replicated and stored
            in the backup's journal, and then a deleteRecord for the pendingID
            is appended. If backup becomes live and loads the journal, it will
            drop the large message add record because there is a deleteRecord of
            the same ID (even though it is a pendingID of another message).
            As a result the expecting client will never get this large message.

            You can merge this pull request into a Git repository by running:

            $ git pull https://github.com/gaohoward/activemq-artemis master_1221

            Alternatively you can review and apply these changes as the patch at:

            https://github.com/apache/activemq-artemis/pull/1347.patch

            To close this pull request, make a commit to your master/trunk branch
            with (at least) the following in the commit message:

            This closes #1347


            commit 1111dde2d690c78d42973a653966dbdf32718eb1
            Author: Howard Gao <howard.gao@gmail.com>
            Date: 2017-06-19T08:31:07Z

            ARTEMIS-1221 Duplicated ID causes LargeMessage lost at backup

            When a large message is replicated to backup, a pendingID is generated
            when the large message is finished. This pendingID is generated by a
            BatchingIDGenerator at backup.

            It is possible that a pendingID generated at backup may be a duplicate
            to an ID generated at live server.

            This can cause a problem when a large message with a messageID that is
            the same as another largemessage's pendingID is replicated and stored
            in the backup's journal, and then a deleteRecord for the pendingID
            is appended. If backup becomes live and loads the journal, it will
            drop the large message add record because there is a deleteRecord of
            the same ID (even though it is a pendingID of another message).
            As a result the expecting client will never get this large message.


            githubbot ASF GitHub Bot added a comment - GitHub user gaohoward opened a pull request: https://github.com/apache/activemq-artemis/pull/1347 ARTEMIS-1221 Duplicated ID causes LargeMessage lost at backup When a large message is replicated to backup, a pendingID is generated when the large message is finished. This pendingID is generated by a BatchingIDGenerator at backup. It is possible that a pendingID generated at backup may be a duplicate to an ID generated at live server. This can cause a problem when a large message with a messageID that is the same as another largemessage's pendingID is replicated and stored in the backup's journal, and then a deleteRecord for the pendingID is appended. If backup becomes live and loads the journal, it will drop the large message add record because there is a deleteRecord of the same ID (even though it is a pendingID of another message). As a result the expecting client will never get this large message. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gaohoward/activemq-artemis master_1221 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/activemq-artemis/pull/1347.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1347 commit 1111dde2d690c78d42973a653966dbdf32718eb1 Author: Howard Gao <howard.gao@gmail.com> Date: 2017-06-19T08:31:07Z ARTEMIS-1221 Duplicated ID causes LargeMessage lost at backup When a large message is replicated to backup, a pendingID is generated when the large message is finished. This pendingID is generated by a BatchingIDGenerator at backup. It is possible that a pendingID generated at backup may be a duplicate to an ID generated at live server. This can cause a problem when a large message with a messageID that is the same as another largemessage's pendingID is replicated and stored in the backup's journal, and then a deleteRecord for the pendingID is appended. If backup becomes live and loads the journal, it will drop the large message add record because there is a deleteRecord of the same ID (even though it is a pendingID of another message). As a result the expecting client will never get this large message.
            githubbot ASF GitHub Bot added a comment -

            Github user clebertsuconic commented on a diff in the pull request:

            https://github.com/apache/activemq-artemis/pull/1347#discussion_r123297601

            — Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/protocol/core/impl/wireformat/ReplicationLargeMessageBeginMessage.java —
            @@ -44,11 +46,13 @@ public int expectedEncodeSize() {
            @Override
            public void encodeRest(final ActiveMQBuffer buffer) {
            buffer.writeLong(messageId);
            — End diff –

            I'm not sure how this is possibly working...

            expectedEncodeSize is not taking this new long into consideration...

            Besides this will cause versioning issues...

            So, you will need to do something like this on reading...

            On reading.. you will need to verify the positioning of the buffer.. against the declared size.. if there's another long coming.. otherwise it wouldn't read it.. and use the current semantic.

            this is a -1.. can you fix it?

            githubbot ASF GitHub Bot added a comment - Github user clebertsuconic commented on a diff in the pull request: https://github.com/apache/activemq-artemis/pull/1347#discussion_r123297601 — Diff: artemis-server/src/main/java/org/apache/activemq/artemis/core/protocol/core/impl/wireformat/ReplicationLargeMessageBeginMessage.java — @@ -44,11 +46,13 @@ public int expectedEncodeSize() { @Override public void encodeRest(final ActiveMQBuffer buffer) { buffer.writeLong(messageId); — End diff – I'm not sure how this is possibly working... expectedEncodeSize is not taking this new long into consideration... Besides this will cause versioning issues... So, you will need to do something like this on reading... On reading.. you will need to verify the positioning of the buffer.. against the declared size.. if there's another long coming.. otherwise it wouldn't read it.. and use the current semantic. this is a -1.. can you fix it?
            githubbot ASF GitHub Bot added a comment -

            Github user gaohoward commented on the issue:

            https://github.com/apache/activemq-artemis/pull/1347

            @clebertsuconic ok, I'll fix it.

            githubbot ASF GitHub Bot added a comment - Github user gaohoward commented on the issue: https://github.com/apache/activemq-artemis/pull/1347 @clebertsuconic ok, I'll fix it.
            githubbot ASF GitHub Bot added a comment -

            Github user gaohoward commented on the issue:

            https://github.com/apache/activemq-artemis/pull/1347

            @clebertsuconic
            well, it turns out more work need to be done. It happens that the changes of code just avoids the bug and makes the test pass. The pendingIDs are not used at all. I need a bit more time to correct it.

            githubbot ASF GitHub Bot added a comment - Github user gaohoward commented on the issue: https://github.com/apache/activemq-artemis/pull/1347 @clebertsuconic well, it turns out more work need to be done. It happens that the changes of code just avoids the bug and makes the test pass. The pendingIDs are not used at all. I need a bit more time to correct it.
            githubbot ASF GitHub Bot added a comment -

            Github user gaohoward commented on the issue:

            https://github.com/apache/activemq-artemis/pull/1347

            @clebertsuconic I think I've done it correctly this time. Pls review again.
            Thanks
            Howard

            githubbot ASF GitHub Bot added a comment - Github user gaohoward commented on the issue: https://github.com/apache/activemq-artemis/pull/1347 @clebertsuconic I think I've done it correctly this time. Pls review again. Thanks Howard
            githubbot ASF GitHub Bot added a comment -

            Github user clebertsuconic commented on the issue:

            https://github.com/apache/activemq-artemis/pull/1347

            whoa.. that's a nice Fix!!! Very nice!!!!

            githubbot ASF GitHub Bot added a comment - Github user clebertsuconic commented on the issue: https://github.com/apache/activemq-artemis/pull/1347 whoa.. that's a nice Fix!!! Very nice!!!!
            githubbot ASF GitHub Bot added a comment -

            Github user gaohoward commented on the issue:

            https://github.com/apache/activemq-artemis/pull/1347

            @clebertsuconic Thanks!

            githubbot ASF GitHub Bot added a comment - Github user gaohoward commented on the issue: https://github.com/apache/activemq-artemis/pull/1347 @clebertsuconic Thanks!

            Commit d50f577cd50df37634f592db65200861fe3e13d3 in activemq-artemis's branch refs/heads/master from gaohoward
            [ https://git-wip-us.apache.org/repos/asf?p=activemq-artemis.git;h=d50f577 ]

            ARTEMIS-1221 Duplicated ID causes LargeMessage lost at backup

            When a large message is replicated to backup, a pendingID is generated
            when the large message is finished. This pendingID is generated by a
            BatchingIDGenerator at backup.

            It is possible that a pendingID generated at backup may be a duplicate
            to an ID generated at live server.

            This can cause a problem when a large message with a messageID that is
            the same as another largemessage's pendingID is replicated and stored
            in the backup's journal, and then a deleteRecord for the pendingID
            is appended. If backup becomes live and loads the journal, it will
            drop the large message add record because there is a deleteRecord of
            the same ID (even though it is a pendingID of another message).
            As a result the expecting client will never get this large message.

            So in summary, the root cause is that the pendingIDs for large
            messages are generated at backup while backup is not alive.

            The solution to this is that instead of the backup generating
            the pendingID, we make them all be generated in advance
            at live server and let them replicated to backup whereever needed.
            The ID generater at backup only works when backup becomes live
            (when it is properly initialized from journal).

            jira-bot ASF subversion and git services added a comment - Commit d50f577cd50df37634f592db65200861fe3e13d3 in activemq-artemis's branch refs/heads/master from gaohoward [ https://git-wip-us.apache.org/repos/asf?p=activemq-artemis.git;h=d50f577 ] ARTEMIS-1221 Duplicated ID causes LargeMessage lost at backup When a large message is replicated to backup, a pendingID is generated when the large message is finished. This pendingID is generated by a BatchingIDGenerator at backup. It is possible that a pendingID generated at backup may be a duplicate to an ID generated at live server. This can cause a problem when a large message with a messageID that is the same as another largemessage's pendingID is replicated and stored in the backup's journal, and then a deleteRecord for the pendingID is appended. If backup becomes live and loads the journal, it will drop the large message add record because there is a deleteRecord of the same ID (even though it is a pendingID of another message). As a result the expecting client will never get this large message. So in summary, the root cause is that the pendingIDs for large messages are generated at backup while backup is not alive. The solution to this is that instead of the backup generating the pendingID, we make them all be generated in advance at live server and let them replicated to backup whereever needed. The ID generater at backup only works when backup becomes live (when it is properly initialized from journal).
            githubbot ASF GitHub Bot added a comment -

            Github user asfgit closed the pull request at:

            https://github.com/apache/activemq-artemis/pull/1347

            githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/activemq-artemis/pull/1347

            People

              gaohoward Howard Gao
              gaohoward Howard Gao
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: