Qpid
  1. Qpid
  2. QPID-3757

Difficult recovery on broker death

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.12
    • Fix Version/s: 0.17
    • Component/s: C++ Client
    • Labels:
      None
    • Environment:

      RHEL 4.7 and RHEL 6.2

      Description

      When using the old API (which might render this bug as invalid, if the old API is completely deprecated), if the broker dies, it's not possible to recover Subscription and LocalQueue variables unless you follow a precise workaround procedure.

      The problem is:
      If the broker dies and is then respawned, if one attempts to reconnect to the new broker and doesn't create a new Session (i.e., use the old one), bad things happen (since Session doesn't yet support resume(), I assume that's expected behavior).
      If, however, one tries to create new Session, new SubscriptionManager, and new Subscription objects, an assertion failure is generated (backtrace attached).
      After reading the backtrace, I believe the following is happening:
      1) In recovery, we attempt to assign a new Subscription to the previous Subscription variable (i.e., "sub = subMgr->subscribe()")
      2) That causes the refcount for the old Subscription to fall to 0, causing it to be cleaned up.
      3) As part of that cleanup, the associated SubscriptionImpl object goes to destroy its (std::auto_ptr<ScopedDivert>) demuxRule member.
      4) That demuxRule member maintains a reference to a Demux object, demuxer, which exists inside the Session object. Since the Session object has been re-created, that old reference is invalid & results in the assertion.

      Thus, we have a fatal circle - we need to create a new Session object to be able to proceed, but when we do so, we render ourselves unable to re-use Subscription variables.

      Gordon proposed a workaround which does solve the problem for me, in practice, and that is to assign "null" Subscription and LocalQueue objects to those variables before re-creating the Session object. Unfortunately, this won't be clear to any new users, so if anyone is still using the old API, they might be likely to encounter it.

      I'll attach an example showing the problem and the fix as well as snippets from my backtrace shortly.

      1. backtrace
        3 kB
        Rob Springer
      2. restart_example.cpp
        2 kB
        Rob Springer

        Activity

        Justin Ross made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Gordon Sim made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 0.17 [ 12320179 ]
        Resolution Fixed [ 1 ]
        Gordon Sim made changes -
        Assignee Gordon Sim [ gsim ]
        Hide
        Rob Springer added a comment -

        FINALLY - Apologies, but I'm too unfamiliar with this area of the code to be able to suggest a possible fix.

        Show
        Rob Springer added a comment - FINALLY - Apologies, but I'm too unfamiliar with this area of the code to be able to suggest a possible fix.
        Rob Springer made changes -
        Attachment backtrace [ 12510498 ]
        Hide
        Rob Springer added a comment -

        Backtrace demonstrating the issue.

        Show
        Rob Springer added a comment - Backtrace demonstrating the issue.
        Rob Springer made changes -
        Field Original Value New Value
        Attachment restart_example.cpp [ 12510497 ]
        Hide
        Rob Springer added a comment -

        Compile with:
        g++ -g -Wall restart_example.cpp -o restart_example -I. -I$

        {QPID_PREFIX}/include -L${QPID_PREFIX}

        -lqpidclient -lqpidcommon -lqpidtypes -lboost_system

        Add -DMAKE_IT_WORK to run with the workaround.

        Procedure:
        1) Start the broker
        2) Start "restart_example"
        3) Kill the broker
        4) See connection exceptions coming from "restart_example"
        5) Restart the broker.

        Without the fix, "restart_example" will terminate with an assert. With the fix, no error will be generated.

        Show
        Rob Springer added a comment - Compile with: g++ -g -Wall restart_example.cpp -o restart_example -I. -I$ {QPID_PREFIX}/include -L${QPID_PREFIX} -lqpidclient -lqpidcommon -lqpidtypes -lboost_system Add -DMAKE_IT_WORK to run with the workaround. Procedure: 1) Start the broker 2) Start "restart_example" 3) Kill the broker 4) See connection exceptions coming from "restart_example" 5) Restart the broker. Without the fix, "restart_example" will terminate with an assert. With the fix, no error will be generated.
        Rob Springer created issue -

          People

          • Assignee:
            Gordon Sim
            Reporter:
            Rob Springer
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development