Derby
  1. Derby
  2. DERBY-5562

An read-only XA transaction that has a timeout never has the timer canceled when the transaction is complete

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.8.2.2
    • Fix Version/s: 10.8.3.0, 10.9.1.0
    • Component/s: JDBC
    • Labels:
      None
    • Environment:
      Solaris 10
      Glassfish V2.1.1
    • Bug behavior facts:
      Deviation from standard, Seen in production

      Description

      From the EmbedConnection.java comment for "xa_prepare".

      // On a prepare call, xa allows an optimization that if the
      // transaction is read only, the RM can just go ahead and
      // commit it. So if store returns this read only status -
      // meaning store has taken the liberty to commit already - we
      // needs to turn around and call internalCommit (without
      // committing the store again) to make sure the state is
      // consistent. Since the transaction is read only, there is
      // probably not much that needs to be done.

      When XATransactionController.XA_RDONLY is returned, no "commit" or "rollback" will be received. If a XA transaction timer was started (could be through the derby.jdbc.xaTransactionTimeout=xxx for example), then these transactions will not have their timer canceled and will later attempt to cancel the transaction and fail as the transaction no longer exists.

      1. DERBY-5562.patch
        0.9 kB
        Brett Bergquist
      2. D5562.java
        2 kB
        Knut Anders Hatlen
      3. d5562-1a-with-test.diff
        6 kB
        Knut Anders Hatlen

        Activity

        Hide
        Brett Bergquist added a comment -

        A patch to fix this problem. This calls XATransactionState.xa_finalize() to release the timer because no call to rollback or commit is going to come for a read-only transaction.

        Show
        Brett Bergquist added a comment - A patch to fix this problem. This calls XATransactionState.xa_finalize() to release the timer because no call to rollback or commit is going to come for a read-only transaction.
        Hide
        Knut Anders Hatlen added a comment -

        I'm attempting to write a test for this bug. What symptom should I be looking for? The bug description mentions that the cancellation timer will fail as the transaction no longer exists after preparing a read-only transaction.

        I tried to reproduce this by running the attached program, which prepares a read-only transaction with a 20 second timeout. After 20 seconds, I see the following printed to derby.log:

        The XA transaction timed out and is going to be rolled back. The transaction Xid is (1,0102,0304).

        However, I don't see any failures. What's missing in order to reproduce the failures?

        (I guess an alternative way to reproduce the bug would be to start a new transaction using the same xid after preparing the first one, and checking if the second transaction gets cancelled by the timer from the first transaction.)

        Show
        Knut Anders Hatlen added a comment - I'm attempting to write a test for this bug. What symptom should I be looking for? The bug description mentions that the cancellation timer will fail as the transaction no longer exists after preparing a read-only transaction. I tried to reproduce this by running the attached program, which prepares a read-only transaction with a 20 second timeout. After 20 seconds, I see the following printed to derby.log: The XA transaction timed out and is going to be rolled back. The transaction Xid is (1,0102,0304). However, I don't see any failures. What's missing in order to reproduce the failures? (I guess an alternative way to reproduce the bug would be to start a new transaction using the same xid after preparing the first one, and checking if the second transaction gets cancelled by the timer from the first transaction.)
        Hide
        Brett Bergquist added a comment -

        The failure is that the transaction timeout is not canceled when it should be and will produce a message in the log when this is the case.

        The test needs to create a XA transaction with timeout (I used the derby.jdbc.xaTransactionTimeout to set the timeout), start the XA transaction, use read-only SQL to the database and then prepare the XA transaction. The prepare code detects that only read-only work was done and implicitly commits the transaction. The prepare returns XA_RDONLY and you do not commit the transaction in the test code. The timer started by the "start" method will not be canceled and the timer will continue and when fired will try to rollback the transaction that has already be implicitly committed and ended.

        See issue

        https://issues.apache.org/jira/browse/DERBY-960

        which fixed the prepare to return XA_DDONLY.

        Show
        Brett Bergquist added a comment - The failure is that the transaction timeout is not canceled when it should be and will produce a message in the log when this is the case. The test needs to create a XA transaction with timeout (I used the derby.jdbc.xaTransactionTimeout to set the timeout), start the XA transaction, use read-only SQL to the database and then prepare the XA transaction. The prepare code detects that only read-only work was done and implicitly commits the transaction. The prepare returns XA_RDONLY and you do not commit the transaction in the test code. The timer started by the "start" method will not be canceled and the timer will continue and when fired will try to rollback the transaction that has already be implicitly committed and ended. See issue https://issues.apache.org/jira/browse/DERBY-960 which fixed the prepare to return XA_DDONLY.
        Hide
        Knut Anders Hatlen added a comment -

        Thanks. The attached patch adds a test case that exposes the bug. It prepares a read-only transaction that has a timeout, and then starts a new transaction with the same Xid and no timeout. The second transaction sleeps for while to make sure it exceeds the timeout of the first transaction. The second transaction is aborted because of the bug, and the test case fails. With the fix, the transaction is not aborted, and the test case passes.

        The patch also moves the fix from EmbedXAResource to XATransactionState, so that all handling of the timer is private to that class. I also made XATransactionState.xa_finalize() private to emphasize this (and removed the synchronized modifier since all callers already synchronize before calling the method).

        All the regression tests ran cleanly with this patch.

        Show
        Knut Anders Hatlen added a comment - Thanks. The attached patch adds a test case that exposes the bug. It prepares a read-only transaction that has a timeout, and then starts a new transaction with the same Xid and no timeout. The second transaction sleeps for while to make sure it exceeds the timeout of the first transaction. The second transaction is aborted because of the bug, and the test case fails. With the fix, the transaction is not aborted, and the test case passes. The patch also moves the fix from EmbedXAResource to XATransactionState, so that all handling of the timer is private to that class. I also made XATransactionState.xa_finalize() private to emphasize this (and removed the synchronized modifier since all callers already synchronize before calling the method). All the regression tests ran cleanly with this patch.
        Hide
        Dag H. Wanvik added a comment -

        Looks good to me. I tried the test without the patch and it failed as expected. +1

        Show
        Dag H. Wanvik added a comment - Looks good to me. I tried the test without the patch and it failed as expected. +1
        Hide
        Mike Matrigali added a comment -

        change and test look good to me. Any idea how many versions back this problem affects?

        Show
        Mike Matrigali added a comment - change and test look good to me. Any idea how many versions back this problem affects?
        Hide
        Brett Bergquist added a comment -

        It appears the timer was implemented in https://issues.apache.org/jira/browse/DERBY-2508 which has a fix at 10.3.1.4

        Show
        Brett Bergquist added a comment - It appears the timer was implemented in https://issues.apache.org/jira/browse/DERBY-2508 which has a fix at 10.3.1.4
        Hide
        Knut Anders Hatlen added a comment -

        Committed to trunk with revision 1230480. I also plan to backport the fix to 10.8.

        Show
        Knut Anders Hatlen added a comment - Committed to trunk with revision 1230480. I also plan to backport the fix to 10.8.
        Hide
        Knut Anders Hatlen added a comment -

        The nightly tests passed on all platforms with the fix, and so did a manual run of the tests with the fix on the 10.8 branch (except one occurrence of DERBY-5498). Backported to 10.8 and committed revision 1230914.

        Show
        Knut Anders Hatlen added a comment - The nightly tests passed on all platforms with the fix, and so did a manual run of the tests with the fix on the 10.8 branch (except one occurrence of DERBY-5498 ). Backported to 10.8 and committed revision 1230914.
        Hide
        Knut Anders Hatlen added a comment -

        [bulk update: close all resolved issues that haven't had any activity the last year]

        Show
        Knut Anders Hatlen added a comment - [bulk update: close all resolved issues that haven't had any activity the last year]

          People

          • Assignee:
            Knut Anders Hatlen
            Reporter:
            Brett Bergquist
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development