Qpid
  1. Qpid
  2. QPID-4503

[Java broker] Producer transaction timeout detection feature may log spurious open/idle alerts and close client connections/sessions without good cause

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.12, 0.14, 0.16, 0.18, 0.20
    • Fix Version/s: 0.20
    • Component/s: Java Broker
    • Labels:
      None

      Description

      There is a race condition in AMQChannel#checkTransactionStatus (and ServerSession#checkTransactionStatus) which can lead to spurious open/idle alerts and needlessly closed client connections (0-8..0-9-1) or client sessions (0-10).

      When the defect manifests, the alert generated incorrectly identifies the transaction as being open since Jan 1, 1970. eg. 1,355,281,641,112 ms

      e.g.

      [con:11(guest@/127.0.0.1:47276/test)/ch:2] CHN-1007 : Open Transaction : 1,355,281,641,112 ms
      

      This defect affects only users who have configured the transaction timeout feature (as described http://qpid.apache.org/books/trunk/AMQP-Messaging-Broker-Java-Book/html/Java-Broker-Runtime-Producer-Transaction-Timeout.html).

        Activity

        Hide
        Keith Wall added a comment - - edited

        I am able to reliably reproduce this defect on my our machine (Intel Xeon E5620) with the attached perf test script and a virtualhost config to use BDB, with transaction timeout configured and housekeeping at 25ms checkPeriod. With this configuration, I see about 20-30 spurious reports per 120s test run. See attachment.

        Show
        Keith Wall added a comment - - edited I am able to reliably reproduce this defect on my our machine (Intel Xeon E5620) with the attached perf test script and a virtualhost config to use BDB, with transaction timeout configured and housekeeping at 25ms checkPeriod. With this configuration, I see about 20-30 spurious reports per 120s test run. See attachment.
        Hide
        Keith Wall added a comment -

        Hi Robbie, would you mind reviewing this commit?

        Show
        Keith Wall added a comment - Hi Robbie, would you mind reviewing this commit?
        Hide
        Robbie Gemmell added a comment -

        Looks good to me. Should we ask to include this in 0.20?

        Show
        Robbie Gemmell added a comment - Looks good to me. Should we ask to include this in 0.20?
        Hide
        Keith Wall added a comment -

        There is a second race condition in the transaction timeout feature that can cause a spurious idle warning (or spurious idle closes).

        It is a race between the housekeeper and a newly begun transaction and causes the last activity time of the previous transaction to be used rather than the current one. It produces a spurious alert with an apparently genuine millisecond value e.g.

        CHN-1008 : Idle Transaction : 5,007 ms
        (for the case where the previous transaction was active approximately 5 seconds ago)

        The issue is when a transaction commits (or rolls-back), the transactionUpdateTime is not reset to zero. This means that as the next transaction begins (whichsets the transactionStartTime then updates the transactionUpdateTime), there exists a window of time where the transactionStartTime refers to the new transaction while the transactionUpdateTime exists to the previous transaction. If the housekeeper happens to run in the intervening moment, a spurious message will result.

                                               + Housekeeper observes txn2 startTime and txn1 updateTime!!!
                                               |
                                               |
        10                                15   |
        |                                 |    v
        -----------------------------------------------------------> time
        ^          ^                         ^             ^
        txn1 send  txn1 commit               txn2 send     txn2 commit
        ^  ^       ^                         ^   ^
        |  |       |                         |   |
        |  |       |                         |   txUpdateTime set
        |  |       |                         |
        |  |       |                         txStartTime set
        |  |       txStartTime clear
        |  txUpdateTime set
        txStartTime set
        
        Show
        Keith Wall added a comment - There is a second race condition in the transaction timeout feature that can cause a spurious idle warning (or spurious idle closes). It is a race between the housekeeper and a newly begun transaction and causes the last activity time of the previous transaction to be used rather than the current one. It produces a spurious alert with an apparently genuine millisecond value e.g. CHN-1008 : Idle Transaction : 5,007 ms (for the case where the previous transaction was active approximately 5 seconds ago) The issue is when a transaction commits (or rolls-back), the transactionUpdateTime is not reset to zero. This means that as the next transaction begins (whichsets the transactionStartTime then updates the transactionUpdateTime), there exists a window of time where the transactionStartTime refers to the new transaction while the transactionUpdateTime exists to the previous transaction. If the housekeeper happens to run in the intervening moment, a spurious message will result. + Housekeeper observes txn2 startTime and txn1 updateTime!!! | | 10 15 | | | v -----------------------------------------------------------> time ^ ^ ^ ^ txn1 send txn1 commit txn2 send txn2 commit ^ ^ ^ ^ ^ | | | | | | | | | txUpdateTime set | | | | | | | txStartTime set | | txStartTime clear | txUpdateTime set txStartTime set
        Hide
        Keith Wall added a comment -

        Patch applied. Phil, can you review this commit please?

        Show
        Keith Wall added a comment - Patch applied. Phil, can you review this commit please?
        Hide
        Philip Harvey added a comment -

        reviewed the code - looks good.

        Show
        Philip Harvey added a comment - reviewed the code - looks good.
        Hide
        Justin Ross added a comment -

        Reviewed by Robbie and Philip, with some extra info for my benefit from Robbie on the list. Approved for 0.20.

        Show
        Justin Ross added a comment - Reviewed by Robbie and Philip, with some extra info for my benefit from Robbie on the list. Approved for 0.20.
        Hide
        Robbie Gemmell added a comment -
        Show
        Robbie Gemmell added a comment - These were merged to the 0.20 release branch: http://svn.apache.org/viewvc?rev=1430904&view=rev http://svn.apache.org/viewvc?rev=1430909&view=rev

          People

          • Assignee:
            Philip Harvey
            Reporter:
            Keith Wall
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development