Qpid
  1. Qpid
  2. QPID-4780

HA broker deadlock after loss of primary broker

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20
    • Fix Version/s: 0.23
    • Component/s: C++ Clustering
    • Labels:
      None

      Description

      Description of problem:
      While fencing nodes in a cluster, occasionally encounter an issue where a previously backup broker becomes deadlocked while deleting auto-delete queues. Only noticed the issue because 'qpid-ha promote' hangs attempting to promote a backup to primary.

      Version-Release number of selected component (if applicable):
      Qpid 0.18

      How reproducible:
      Rare (race condition)
      see also: https://bugzilla.redhat.com/show_bug.cgi?id=889552

      Steps to Reproduce:
      1. Start HA-enabled brokers
      2. Create tens-of-thousands of auto-delete queues
      3. Fence / power-cycle the node hosting the primary broker

      Actual results:
      Occasionally the backup broker deadlocks

      Expected results:
      The backup broker does not deadlock

      Additional info:

        Activity

        Hide
        Justin Ross added a comment -
        Show
        Justin Ross added a comment - Released in Qpid 0.24, http://qpid.apache.org/releases/qpid-0.24/index.html
        Hide
        Alan Conway added a comment - - edited

        Unable to reproduce, but found a lock ordering deadlock by inspection of the code that would lead to the stack trace given:

        In one thread:

        • Link::ioThreadProcessing takes Link:lock then calls
        • QueueReplicator::initializeBridge tries to lock QueueReplicator::lock.
          Concurrently in another thread
        • QueueReplicator::destroy takes QueueReplicator::lock then calls
        • Bridge::destroy which tries to lock the Link::lock

        This patch removes the locking around destroyBridge

        ------------------------------------------------------------------------
        r1476305 | aconway | 2013-04-26 13:28:26 -0400 (Fri, 26 Apr 2013) | 9 lines

        QPID-4780: Bug 889552 - HA broker deadlock after loss of primary broker.

        Lock ordering deadlock found by inspection of code and stack trace:

        • thread 1: Link::ioThreadProcessing(Link:lock)-> QueueReplicator::initializeBridge(QueueReplicator::lock)
        • thread 2: QueueReplicator::destroy(QueueReplicator::lock)-> Bridge::destroy(Link::lock)

        This patch breaks the lock by removing locking around Bridge::destroy in QueueReplicator::destroy.

        Committed to trunk
        ------------------------------------------------------------------------

        Show
        Alan Conway added a comment - - edited Unable to reproduce, but found a lock ordering deadlock by inspection of the code that would lead to the stack trace given: In one thread: Link::ioThreadProcessing takes Link:lock then calls QueueReplicator::initializeBridge tries to lock QueueReplicator::lock. Concurrently in another thread QueueReplicator::destroy takes QueueReplicator::lock then calls Bridge::destroy which tries to lock the Link::lock This patch removes the locking around destroyBridge ------------------------------------------------------------------------ r1476305 | aconway | 2013-04-26 13:28:26 -0400 (Fri, 26 Apr 2013) | 9 lines QPID-4780 : Bug 889552 - HA broker deadlock after loss of primary broker. Lock ordering deadlock found by inspection of code and stack trace: thread 1: Link::ioThreadProcessing(Link:lock)-> QueueReplicator::initializeBridge(QueueReplicator::lock) thread 2: QueueReplicator::destroy(QueueReplicator::lock)-> Bridge::destroy(Link::lock) This patch breaks the lock by removing locking around Bridge::destroy in QueueReplicator::destroy. Committed to trunk ------------------------------------------------------------------------

          People

          • Assignee:
            Alan Conway
            Reporter:
            Alan Conway
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development