Uploaded image for project: 'Qpid'
  1. Qpid
  2. QPID-3757

Difficult recovery on broker death

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.12
    • 0.17
    • C++ Client
    • None
    • RHEL 4.7 and RHEL 6.2

    Description

      When using the old API (which might render this bug as invalid, if the old API is completely deprecated), if the broker dies, it's not possible to recover Subscription and LocalQueue variables unless you follow a precise workaround procedure.

      The problem is:
      If the broker dies and is then respawned, if one attempts to reconnect to the new broker and doesn't create a new Session (i.e., use the old one), bad things happen (since Session doesn't yet support resume(), I assume that's expected behavior).
      If, however, one tries to create new Session, new SubscriptionManager, and new Subscription objects, an assertion failure is generated (backtrace attached).
      After reading the backtrace, I believe the following is happening:
      1) In recovery, we attempt to assign a new Subscription to the previous Subscription variable (i.e., "sub = subMgr->subscribe()")
      2) That causes the refcount for the old Subscription to fall to 0, causing it to be cleaned up.
      3) As part of that cleanup, the associated SubscriptionImpl object goes to destroy its (std::auto_ptr<ScopedDivert>) demuxRule member.
      4) That demuxRule member maintains a reference to a Demux object, demuxer, which exists inside the Session object. Since the Session object has been re-created, that old reference is invalid & results in the assertion.

      Thus, we have a fatal circle - we need to create a new Session object to be able to proceed, but when we do so, we render ourselves unable to re-use Subscription variables.

      Gordon proposed a workaround which does solve the problem for me, in practice, and that is to assign "null" Subscription and LocalQueue objects to those variables before re-creating the Session object. Unfortunately, this won't be clear to any new users, so if anyone is still using the old API, they might be likely to encounter it.

      I'll attach an example showing the problem and the fix as well as snippets from my backtrace shortly.

      Attachments

        1. backtrace
          3 kB
          Rob Springer
        2. restart_example.cpp
          2 kB
          Rob Springer

        Activity

          People

            gsim Gordon Sim
            rspringer Rob Springer
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: