Qpid
  1. Qpid
  2. QPID-2420

Windows SQL-based persistence loses prepared two-phase transactions on broker restart

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6
    • Fix Version/s: 0.7
    • Component/s: C++ Broker
    • Labels:
      None
    • Environment:

      Windows, SQL

      Description

      If the C++ broker restarts after a 2PC transaction is prepared but not committed, the transaction is lost (as if an abort was done on the transaction) when the broker restarts.

        Issue Links

          Activity

          Hide
          Steve Huston added a comment -

          This problem arose because the original MS SQL store module runs a dtx's operations under an outer SQL transaction and commits the SQL transaction on dtx commit; thus, a prepare followed by a crash loses the SQL transaction and any record of the dtx.

          To fix this, here's what I think should be done:

          Table changes:
          1. The store maintains a SQL table of messageId->queueId mappings, one for each place the message is queued. This table will have two more columns added: add/remove (Boolean; true =add, false=remove) and xid (reference to TPL table's xid key column).

          2. Add a new table TPL with prepared Xids. The xid is the key and is referred to by the new xid column in the message mapping table.

          Behavior changes:

          • When a dtx start is received (in the store, not in AMQP) a SQL transaction is started and a new TPL record is added with the xid.
          • On enqueue() add the message mapping record for the operation, and include the 'add' indicator and dtx's xid
          • On dequeue() add the xid to the existing mapping record and ensure the add/remove field is 'remove'
          • When dtx prepare is received in the store, commit the outer SQL transaction from dtx start
          • For dtx commit, under SQL transaction
            1. Remove the xid record from the TPL table
            2. For each mapping record that refers to the xid: delete 'remove' records, remove xid and clear 'add' indicator from add records.
          • For dtx rollback, under SQL transaction
            1. Remove the xid record from the TPL table
            2. For each mapping record that refers to the xid: delete 'add' records; remove xid value from 'remove' records.
          • On recovery
            1. Read all the xids from the TPL table; there's one for each prepared dtx; call recoverTransaction for each
            2. For each xid, get the mapping records associated with the xid; based on the add/remove indicator, call either RecoverableTransaction::enqueue() or dequeue()
          Show
          Steve Huston added a comment - This problem arose because the original MS SQL store module runs a dtx's operations under an outer SQL transaction and commits the SQL transaction on dtx commit; thus, a prepare followed by a crash loses the SQL transaction and any record of the dtx. To fix this, here's what I think should be done: Table changes: 1. The store maintains a SQL table of messageId->queueId mappings, one for each place the message is queued. This table will have two more columns added: add/remove (Boolean; true =add, false=remove) and xid (reference to TPL table's xid key column). 2. Add a new table TPL with prepared Xids. The xid is the key and is referred to by the new xid column in the message mapping table. Behavior changes: When a dtx start is received (in the store, not in AMQP) a SQL transaction is started and a new TPL record is added with the xid. On enqueue() add the message mapping record for the operation, and include the 'add' indicator and dtx's xid On dequeue() add the xid to the existing mapping record and ensure the add/remove field is 'remove' When dtx prepare is received in the store, commit the outer SQL transaction from dtx start For dtx commit, under SQL transaction 1. Remove the xid record from the TPL table 2. For each mapping record that refers to the xid: delete 'remove' records, remove xid and clear 'add' indicator from add records. For dtx rollback, under SQL transaction 1. Remove the xid record from the TPL table 2. For each mapping record that refers to the xid: delete 'add' records; remove xid value from 'remove' records. On recovery 1. Read all the xids from the TPL table; there's one for each prepared dtx; call recoverTransaction for each 2. For each xid, get the mapping records associated with the xid; based on the add/remove indicator, call either RecoverableTransaction::enqueue() or dequeue()
          Hide
          Steve Huston added a comment -

          The bulk of the changes are in the trunk at r935068.

          I still need to commit the tests, after testing brokertest.py changes on Linux.
          Also need to add the trigger creation which removes unreferenced messages from the messages table.

          Show
          Steve Huston added a comment - The bulk of the changes are in the trunk at r935068. I still need to commit the tests, after testing brokertest.py changes on Linux. Also need to add the trigger creation which removes unreferenced messages from the messages table.
          Hide
          Steve Huston added a comment -

          Added trigger to delete orphaned records and also test execution when store is part of the build. On trunk at r935710.

          Need to enable the broker-restarting/recovering tests that are commented out in run_store_tests.ps1 after brokertest.py is resolved to work for both Windows and Linux w/ Python 2.3. See QPID-2492.

          Show
          Steve Huston added a comment - Added trigger to delete orphaned records and also test execution when store is part of the build. On trunk at r935710. Need to enable the broker-restarting/recovering tests that are commented out in run_store_tests.ps1 after brokertest.py is resolved to work for both Windows and Linux w/ Python 2.3. See QPID-2492 .
          Hide
          Steve Huston added a comment -

          Fixed previously; was waiting for brokertespy to be resolved and it also has been fixed.

          Show
          Steve Huston added a comment - Fixed previously; was waiting for brokertespy to be resolved and it also has been fixed.

            People

            • Assignee:
              Steve Huston
              Reporter:
              Steve Huston
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Due:
                Created:
                Updated:
                Resolved:

                Development