Qpid
  1. Qpid
  2. QPID-3304

Tagged federation messages in a transaction can cause subsequent dequeue to fail

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10
    • Fix Version/s: 0.13
    • Component/s: C++ Broker
    • Labels:
      None

      Description

      E.g. with Error dequeuing message, persistence id not set (MessageStoreImpl.cpp:1355) (exact error depends on store used).

      To reproduce, bind a durable queue to an exchange with a given binding key the set up a route between that same exchange on two brokers with the 'source broker' being the one with the durable queue bound to it. Then bind a queue on the 'destination' broker with the same binding key as used for the durable queue. Then publish a transactional message to the source brokers exchange with a matching routing key. The message will be enqueued on both the durable queue and the federation routes temporary bridge queue. However the persistence id for the message, set when enqueueing on the durable queue is lost and subsequent attempts to dequeue to fail.

      The problem is that the copy-on-write strategy used to workaround QPID-2670 means that the message on which the persistent store id is set is not then pushed onto the appropriate queues, the copy is (and the copy may not have the persistence id set on it).

        Activity

        Hide
        Gordon Sim added a comment -

        Suggested fix available for review: https://reviews.apache.org/r/888/

        Show
        Gordon Sim added a comment - Suggested fix available for review: https://reviews.apache.org/r/888/
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/888/
        -----------------------------------------------------------

        Review request for Kim van der Riet, Ted Ross and Steve Huston.

        Summary
        -------

        QPID-3304 is essentially a knock-on effect of the rather nasty fix made for QPID-2670. The original problem is that the headers were being modified in some cases after the header body was made available for delivery. This meant that concurrent delivery and modification could cause seg faults.

        This patch attempts to solve that original problem more cleanly. The essential points of the patch are these:

        • prevent access to modifiable header frame outside of message; have explicit modifying methods on the message class itself where correct locking can be used. (One exception here is that the message class still allows direct access to the underlying frames, currently used only in management and clustering as far as I can tell, both of which are safe by inspection).
        • remove the copy-on-write in the queue when adding trace id (i.e. revert previous fix to QPID-2670 that caused QPID-3304).
        • have the copy-on-write pattern encapsulated in the message class itself. I.e. when the frame containing a pointer to the header body has been written, make sure subsequent modifications happen to a clone of that body (as the body may still be in use in delivering the frames to a connection).

        There are then a few fixes in tests to use the new message scoped modifiers rather than manipulating the header external to the message class. These fixes may also be needed in store specific tests.

        This addresses bug QPID-3304.
        https://issues.apache.org/jira/browse/QPID-3304

        Diffs


        /trunk/qpid/cpp/src/qpid/broker/DeliveryRecord.cpp 1130102
        /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1130102
        /trunk/qpid/cpp/src/qpid/broker/Message.h 1130102
        /trunk/qpid/cpp/src/qpid/broker/Message.cpp 1131084
        /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1130102
        /trunk/qpid/cpp/src/qpid/framing/AMQFrame.h 1130102
        /trunk/qpid/cpp/src/qpid/framing/AMQFrame.cpp 1130102
        /trunk/qpid/cpp/src/qpid/management/ManagementAgent.cpp 1130102
        /trunk/qpid/cpp/src/qpid/replication/ReplicatingEventListener.cpp 1130102
        /trunk/qpid/cpp/src/qpid/replication/ReplicationExchange.cpp 1130102
        /trunk/qpid/cpp/src/tests/ExchangeTest.cpp 1130102
        /trunk/qpid/cpp/src/tests/QueueTest.cpp 1130102
        /trunk/qpid/cpp/src/tests/TxPublishTest.cpp 1130102

        Diff: https://reviews.apache.org/r/888/diff

        Testing
        -------

        Make check passes

        Thanks,

        Gordon

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/888/ ----------------------------------------------------------- Review request for Kim van der Riet, Ted Ross and Steve Huston. Summary ------- QPID-3304 is essentially a knock-on effect of the rather nasty fix made for QPID-2670 . The original problem is that the headers were being modified in some cases after the header body was made available for delivery. This meant that concurrent delivery and modification could cause seg faults. This patch attempts to solve that original problem more cleanly. The essential points of the patch are these: prevent access to modifiable header frame outside of message; have explicit modifying methods on the message class itself where correct locking can be used. (One exception here is that the message class still allows direct access to the underlying frames, currently used only in management and clustering as far as I can tell, both of which are safe by inspection). remove the copy-on-write in the queue when adding trace id (i.e. revert previous fix to QPID-2670 that caused QPID-3304 ). have the copy-on-write pattern encapsulated in the message class itself. I.e. when the frame containing a pointer to the header body has been written, make sure subsequent modifications happen to a clone of that body (as the body may still be in use in delivering the frames to a connection). There are then a few fixes in tests to use the new message scoped modifiers rather than manipulating the header external to the message class. These fixes may also be needed in store specific tests. This addresses bug QPID-3304 . https://issues.apache.org/jira/browse/QPID-3304 Diffs /trunk/qpid/cpp/src/qpid/broker/DeliveryRecord.cpp 1130102 /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1130102 /trunk/qpid/cpp/src/qpid/broker/Message.h 1130102 /trunk/qpid/cpp/src/qpid/broker/Message.cpp 1131084 /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1130102 /trunk/qpid/cpp/src/qpid/framing/AMQFrame.h 1130102 /trunk/qpid/cpp/src/qpid/framing/AMQFrame.cpp 1130102 /trunk/qpid/cpp/src/qpid/management/ManagementAgent.cpp 1130102 /trunk/qpid/cpp/src/qpid/replication/ReplicatingEventListener.cpp 1130102 /trunk/qpid/cpp/src/qpid/replication/ReplicationExchange.cpp 1130102 /trunk/qpid/cpp/src/tests/ExchangeTest.cpp 1130102 /trunk/qpid/cpp/src/tests/QueueTest.cpp 1130102 /trunk/qpid/cpp/src/tests/TxPublishTest.cpp 1130102 Diff: https://reviews.apache.org/r/888/diff Testing ------- Make check passes Thanks, Gordon
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/888/#review814
        -----------------------------------------------------------

        /trunk/qpid/cpp/src/qpid/broker/Message.h
        <https://reviews.apache.org/r/888/#comment1775>

        Would it make more sense to pass the lock in?

        • Andy

        On 2011-06-13 11:57:23, Gordon Sim wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/888/

        -----------------------------------------------------------

        (Updated 2011-06-13 11:57:23)

        Review request for Kim van der Riet, Ted Ross and Steve Huston.

        Summary

        -------

        QPID-3304 is essentially a knock-on effect of the rather nasty fix made for QPID-2670. The original problem is that the headers were being modified in some cases after the header body was made available for delivery. This meant that concurrent delivery and modification could cause seg faults.

        This patch attempts to solve that original problem more cleanly. The essential points of the patch are these:

        * prevent access to modifiable header frame outside of message; have explicit modifying methods on the message class itself where correct locking can be used. (One exception here is that the message class still allows direct access to the underlying frames, currently used only in management and clustering as far as I can tell, both of which are safe by inspection).

        * remove the copy-on-write in the queue when adding trace id (i.e. revert previous fix to QPID-2670 that caused QPID-3304).

        * have the copy-on-write pattern encapsulated in the message class itself. I.e. when the frame containing a pointer to the header body has been written, make sure subsequent modifications happen to a clone of that body (as the body may still be in use in delivering the frames to a connection).

        There are then a few fixes in tests to use the new message scoped modifiers rather than manipulating the header external to the message class. These fixes may also be needed in store specific tests.

        This addresses bug QPID-3304.

        https://issues.apache.org/jira/browse/QPID-3304

        Diffs

        -----

        /trunk/qpid/cpp/src/qpid/broker/DeliveryRecord.cpp 1130102

        /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1130102

        /trunk/qpid/cpp/src/qpid/broker/Message.h 1130102

        /trunk/qpid/cpp/src/qpid/broker/Message.cpp 1131084

        /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1130102

        /trunk/qpid/cpp/src/qpid/framing/AMQFrame.h 1130102

        /trunk/qpid/cpp/src/qpid/framing/AMQFrame.cpp 1130102

        /trunk/qpid/cpp/src/qpid/management/ManagementAgent.cpp 1130102

        /trunk/qpid/cpp/src/qpid/replication/ReplicatingEventListener.cpp 1130102

        /trunk/qpid/cpp/src/qpid/replication/ReplicationExchange.cpp 1130102

        /trunk/qpid/cpp/src/tests/ExchangeTest.cpp 1130102

        /trunk/qpid/cpp/src/tests/QueueTest.cpp 1130102

        /trunk/qpid/cpp/src/tests/TxPublishTest.cpp 1130102

        Diff: https://reviews.apache.org/r/888/diff

        Testing

        -------

        Make check passes

        Thanks,

        Gordon

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/888/#review814 ----------------------------------------------------------- /trunk/qpid/cpp/src/qpid/broker/Message.h < https://reviews.apache.org/r/888/#comment1775 > Would it make more sense to pass the lock in? Andy On 2011-06-13 11:57:23, Gordon Sim wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/888/ ----------------------------------------------------------- (Updated 2011-06-13 11:57:23) Review request for Kim van der Riet, Ted Ross and Steve Huston. Summary ------- QPID-3304 is essentially a knock-on effect of the rather nasty fix made for QPID-2670 . The original problem is that the headers were being modified in some cases after the header body was made available for delivery. This meant that concurrent delivery and modification could cause seg faults. This patch attempts to solve that original problem more cleanly. The essential points of the patch are these: * prevent access to modifiable header frame outside of message; have explicit modifying methods on the message class itself where correct locking can be used. (One exception here is that the message class still allows direct access to the underlying frames, currently used only in management and clustering as far as I can tell, both of which are safe by inspection). * remove the copy-on-write in the queue when adding trace id (i.e. revert previous fix to QPID-2670 that caused QPID-3304 ). * have the copy-on-write pattern encapsulated in the message class itself. I.e. when the frame containing a pointer to the header body has been written, make sure subsequent modifications happen to a clone of that body (as the body may still be in use in delivering the frames to a connection). There are then a few fixes in tests to use the new message scoped modifiers rather than manipulating the header external to the message class. These fixes may also be needed in store specific tests. This addresses bug QPID-3304 . https://issues.apache.org/jira/browse/QPID-3304 Diffs ----- /trunk/qpid/cpp/src/qpid/broker/DeliveryRecord.cpp 1130102 /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1130102 /trunk/qpid/cpp/src/qpid/broker/Message.h 1130102 /trunk/qpid/cpp/src/qpid/broker/Message.cpp 1131084 /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1130102 /trunk/qpid/cpp/src/qpid/framing/AMQFrame.h 1130102 /trunk/qpid/cpp/src/qpid/framing/AMQFrame.cpp 1130102 /trunk/qpid/cpp/src/qpid/management/ManagementAgent.cpp 1130102 /trunk/qpid/cpp/src/qpid/replication/ReplicatingEventListener.cpp 1130102 /trunk/qpid/cpp/src/qpid/replication/ReplicationExchange.cpp 1130102 /trunk/qpid/cpp/src/tests/ExchangeTest.cpp 1130102 /trunk/qpid/cpp/src/tests/QueueTest.cpp 1130102 /trunk/qpid/cpp/src/tests/TxPublishTest.cpp 1130102 Diff: https://reviews.apache.org/r/888/diff Testing ------- Make check passes Thanks, Gordon
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2011-06-13 16:20:26, Andy Goldstein wrote:

        > /trunk/qpid/cpp/src/qpid/broker/Message.h, line 206

        > <https://reviews.apache.org/r/888/diff/1/?file=21030#file21030line206>

        >

        > Would it make more sense to pass the lock in?

        I suppose it provides a more explicit hint regarding locking assumptions. The cluster plugin uses that pattern quite heavily; not so common elsewhere but I have no objection to changing it here.

        • Gordon

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/888/#review814
        -----------------------------------------------------------

        On 2011-06-13 11:57:23, Gordon Sim wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/888/

        -----------------------------------------------------------

        (Updated 2011-06-13 11:57:23)

        Review request for Kim van der Riet, Ted Ross and Steve Huston.

        Summary

        -------

        QPID-3304 is essentially a knock-on effect of the rather nasty fix made for QPID-2670. The original problem is that the headers were being modified in some cases after the header body was made available for delivery. This meant that concurrent delivery and modification could cause seg faults.

        This patch attempts to solve that original problem more cleanly. The essential points of the patch are these:

        * prevent access to modifiable header frame outside of message; have explicit modifying methods on the message class itself where correct locking can be used. (One exception here is that the message class still allows direct access to the underlying frames, currently used only in management and clustering as far as I can tell, both of which are safe by inspection).

        * remove the copy-on-write in the queue when adding trace id (i.e. revert previous fix to QPID-2670 that caused QPID-3304).

        * have the copy-on-write pattern encapsulated in the message class itself. I.e. when the frame containing a pointer to the header body has been written, make sure subsequent modifications happen to a clone of that body (as the body may still be in use in delivering the frames to a connection).

        There are then a few fixes in tests to use the new message scoped modifiers rather than manipulating the header external to the message class. These fixes may also be needed in store specific tests.

        This addresses bug QPID-3304.

        https://issues.apache.org/jira/browse/QPID-3304

        Diffs

        -----

        /trunk/qpid/cpp/src/qpid/broker/DeliveryRecord.cpp 1130102

        /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1130102

        /trunk/qpid/cpp/src/qpid/broker/Message.h 1130102

        /trunk/qpid/cpp/src/qpid/broker/Message.cpp 1131084

        /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1130102

        /trunk/qpid/cpp/src/qpid/framing/AMQFrame.h 1130102

        /trunk/qpid/cpp/src/qpid/framing/AMQFrame.cpp 1130102

        /trunk/qpid/cpp/src/qpid/management/ManagementAgent.cpp 1130102

        /trunk/qpid/cpp/src/qpid/replication/ReplicatingEventListener.cpp 1130102

        /trunk/qpid/cpp/src/qpid/replication/ReplicationExchange.cpp 1130102

        /trunk/qpid/cpp/src/tests/ExchangeTest.cpp 1130102

        /trunk/qpid/cpp/src/tests/QueueTest.cpp 1130102

        /trunk/qpid/cpp/src/tests/TxPublishTest.cpp 1130102

        Diff: https://reviews.apache.org/r/888/diff

        Testing

        -------

        Make check passes

        Thanks,

        Gordon

        Show
        jiraposter@reviews.apache.org added a comment - On 2011-06-13 16:20:26, Andy Goldstein wrote: > /trunk/qpid/cpp/src/qpid/broker/Message.h, line 206 > < https://reviews.apache.org/r/888/diff/1/?file=21030#file21030line206 > > > Would it make more sense to pass the lock in? I suppose it provides a more explicit hint regarding locking assumptions. The cluster plugin uses that pattern quite heavily; not so common elsewhere but I have no objection to changing it here. Gordon ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/888/#review814 ----------------------------------------------------------- On 2011-06-13 11:57:23, Gordon Sim wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/888/ ----------------------------------------------------------- (Updated 2011-06-13 11:57:23) Review request for Kim van der Riet, Ted Ross and Steve Huston. Summary ------- QPID-3304 is essentially a knock-on effect of the rather nasty fix made for QPID-2670 . The original problem is that the headers were being modified in some cases after the header body was made available for delivery. This meant that concurrent delivery and modification could cause seg faults. This patch attempts to solve that original problem more cleanly. The essential points of the patch are these: * prevent access to modifiable header frame outside of message; have explicit modifying methods on the message class itself where correct locking can be used. (One exception here is that the message class still allows direct access to the underlying frames, currently used only in management and clustering as far as I can tell, both of which are safe by inspection). * remove the copy-on-write in the queue when adding trace id (i.e. revert previous fix to QPID-2670 that caused QPID-3304 ). * have the copy-on-write pattern encapsulated in the message class itself. I.e. when the frame containing a pointer to the header body has been written, make sure subsequent modifications happen to a clone of that body (as the body may still be in use in delivering the frames to a connection). There are then a few fixes in tests to use the new message scoped modifiers rather than manipulating the header external to the message class. These fixes may also be needed in store specific tests. This addresses bug QPID-3304 . https://issues.apache.org/jira/browse/QPID-3304 Diffs ----- /trunk/qpid/cpp/src/qpid/broker/DeliveryRecord.cpp 1130102 /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1130102 /trunk/qpid/cpp/src/qpid/broker/Message.h 1130102 /trunk/qpid/cpp/src/qpid/broker/Message.cpp 1131084 /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1130102 /trunk/qpid/cpp/src/qpid/framing/AMQFrame.h 1130102 /trunk/qpid/cpp/src/qpid/framing/AMQFrame.cpp 1130102 /trunk/qpid/cpp/src/qpid/management/ManagementAgent.cpp 1130102 /trunk/qpid/cpp/src/qpid/replication/ReplicatingEventListener.cpp 1130102 /trunk/qpid/cpp/src/qpid/replication/ReplicationExchange.cpp 1130102 /trunk/qpid/cpp/src/tests/ExchangeTest.cpp 1130102 /trunk/qpid/cpp/src/tests/QueueTest.cpp 1130102 /trunk/qpid/cpp/src/tests/TxPublishTest.cpp 1130102 Diff: https://reviews.apache.org/r/888/diff Testing ------- Make check passes Thanks, Gordon
        Hide
        Kim van der Riet added a comment -

        Checked in Gordon's patch in r.1148503. Also included a set of tests which detect this condition.

        Show
        Kim van der Riet added a comment - Checked in Gordon's patch in r.1148503. Also included a set of tests which detect this condition.
        Hide
        Kim van der Riet added a comment -

        Further note on reproducing this error through the new tests:

        This error occurs when the following three conditions are met:

        1. The queue type is a regular route. Queue routes do not exhibit the problem. This is because regular routes copy the message to both the bridge queue and the source queue; queue routes do not do this.

        2. Persistence is enabled. Both the target queue and the test messages must be persistent.

        3. Transactions must be used for the enqueue operation on the source broker.

        Under these conditions, attempting to consume a test message from the destination broker after being transferred through the link will result in the following error conditions:

        1. If the message is consumed non-transactionally, the store will throw the "Dequeuing message with null persistence Id" message.

        2. If the message is consumed within a transaction, then the "internal-error: internal-error: Commit failed" message will be seen.

        A set of tests which test links under various combinations of exchange type, durability, transactionality and clustering for both regular and queue routes has been added. However, when run from "make check" in the qpid/cpp dir, tests involving durability are not run, as the async store is not known to the test environment. However, the async store now also has a federation test, and running "make check" from the store will call this set of tests in the qpid test dir and run all the tests with the store enabled. Since durability is required to show the error as described here, the tests must be run from the async store build dir.

        To avoid running all the tests, it is possible to execute the run_federation_sys_tests script from both the qpid cpp/src/test directory or from the async store cpp/test/federation directory.

        Note that the async store is not a part of Apache Qpid at the moment; see https://cwiki.apache.org/qpid/faq.html#FAQ-Persistence for further details on how to obtain this module.

        Show
        Kim van der Riet added a comment - Further note on reproducing this error through the new tests: This error occurs when the following three conditions are met: 1. The queue type is a regular route. Queue routes do not exhibit the problem. This is because regular routes copy the message to both the bridge queue and the source queue; queue routes do not do this. 2. Persistence is enabled. Both the target queue and the test messages must be persistent. 3. Transactions must be used for the enqueue operation on the source broker. Under these conditions, attempting to consume a test message from the destination broker after being transferred through the link will result in the following error conditions: 1. If the message is consumed non-transactionally, the store will throw the "Dequeuing message with null persistence Id" message. 2. If the message is consumed within a transaction, then the "internal-error: internal-error: Commit failed" message will be seen. A set of tests which test links under various combinations of exchange type, durability, transactionality and clustering for both regular and queue routes has been added. However, when run from "make check" in the qpid/cpp dir, tests involving durability are not run, as the async store is not known to the test environment. However, the async store now also has a federation test, and running "make check" from the store will call this set of tests in the qpid test dir and run all the tests with the store enabled. Since durability is required to show the error as described here, the tests must be run from the async store build dir. To avoid running all the tests, it is possible to execute the run_federation_sys_tests script from both the qpid cpp/src/test directory or from the async store cpp/test/federation directory. Note that the async store is not a part of Apache Qpid at the moment; see https://cwiki.apache.org/qpid/faq.html#FAQ-Persistence for further details on how to obtain this module.
        Hide
        Kim van der Riet added a comment -

        Thanks to Gordon Sim for the fix that resolved this issue.

        Show
        Kim van der Riet added a comment - Thanks to Gordon Sim for the fix that resolved this issue.

          People

          • Assignee:
            Kim van der Riet
            Reporter:
            Gordon Sim
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development