Qpid
  1. Qpid
  2. QPID-3369

Loss of cluster elder can result in loss of whole cluster

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 0.7
    • Fix Version/s: None
    • Component/s: C++ Clustering
    • Labels:
      None

      Description

      If the cluster elder is lost, it is possible that the remaining nodes of the
      cluster will fail with the following errors:

      Error delivering frames: Cluster timer wakeup non-existent task
      ManagementAgent::periodicProcessing (qpid/cluster/ClusterTimer.cpp:93)

      – or –

      Error delivering frames: Cluster timer drop non-existent task
      ManagementAgent::periodicProcessing (qpid/cluster/ClusterTimer.cpp:109)

      When a member is promoted to be the elder, the ClusterTimer::becomeElder()
      method will add all the current cluster tasks to the Timer. However, there is
      a potential race condition where CPG can deliver the timer wakeup from the
      original elder.

      1. QPID-3369.patch
        0.5 kB
        Jason Dillaman

        Activity

        Alan Conway made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Fix Version/s Future [ 12315490 ]
        Resolution Cannot Reproduce [ 5 ]
        Hide
        Jason Dillaman added a comment -

        I believe the issue would still exist, but we didn't run into a case where the cluster elder crashed during our latest round of testing.

        Show
        Jason Dillaman added a comment - I believe the issue would still exist, but we didn't run into a case where the cluster elder crashed during our latest round of testing.
        Hide
        Justin Ross added a comment -

        Alan and Jason, what's the status here? I think Jason is saying that the defect is still there.

        Show
        Justin Ross added a comment - Alan and Jason, what's the status here? I think Jason is saying that the defect is still there.
        Alan Conway made changes -
        Fix Version/s Future [ 12315490 ]
        Fix Version/s 0.11 [ 12316272 ]
        Jason Dillaman made changes -
        Resolution Duplicate [ 3 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Hide
        Jason Dillaman added a comment -

        Issue was originally encountered using a version of Qpid which had incorporated the QPID-3280 patch.

        Show
        Jason Dillaman added a comment - Issue was originally encountered using a version of Qpid which had incorporated the QPID-3280 patch.
        Alan Conway made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 0.11 [ 12316272 ]
        Resolution Duplicate [ 3 ]
        Hide
        Alan Conway added a comment -

        Fixed as a side effect by the fix for qpid-3280

        Show
        Alan Conway added a comment - Fixed as a side effect by the fix for qpid-3280
        Hide
        LIU added a comment -

        get here from google.
        recently I encounter this error while testing c++ cluster,reproduce by this:
        1. build a cluster with 2 brokers: A and B,it's working
        2. a client connects to broker A both producing&consuming messages all the time
        3. in broker B,execute the shell command: <iptables -A INPUT -s "A" -j DROP> . I want to test the network error handling in cluster,let broker B drop connections from A.
        4. the client received broker update info,remove B from cluster,that's good.
        5. after some seconds,may be 10,execute the command in B: <iptabes -D INPUT 1>.let broker B receives connections from A
        6. wait some time, both broker A and B get this error:critical Error delivering frames:Cluster timer wakeup non-existent task ManagementAgent::periodicProcessing (qpid/cluster/ClusterTimer.cpp:111); and both brokers shut down!

        I'm using qpid ver 0.10 on RHEL5.5

        Show
        LIU added a comment - get here from google. recently I encounter this error while testing c++ cluster,reproduce by this: 1. build a cluster with 2 brokers: A and B,it's working 2. a client connects to broker A both producing&consuming messages all the time 3. in broker B,execute the shell command: <iptables -A INPUT -s "A" -j DROP> . I want to test the network error handling in cluster,let broker B drop connections from A. 4. the client received broker update info,remove B from cluster,that's good. 5. after some seconds,may be 10,execute the command in B: <iptabes -D INPUT 1>.let broker B receives connections from A 6. wait some time, both broker A and B get this error:critical Error delivering frames:Cluster timer wakeup non-existent task ManagementAgent::periodicProcessing (qpid/cluster/ClusterTimer.cpp:111); and both brokers shut down! I'm using qpid ver 0.10 on RHEL5.5
        Hide
        Alan Conway added a comment -

        Do you have a test to reproduce the issue?

        Show
        Alan Conway added a comment - Do you have a test to reproduce the issue?
        Jason Dillaman made changes -
        Field Original Value New Value
        Attachment QPID-3369.patch [ 12487731 ]
        Hide
        Jason Dillaman added a comment -

        Potential patch for issue

        Show
        Jason Dillaman added a comment - Potential patch for issue
        Jason Dillaman created issue -

          People

          • Assignee:
            Alan Conway
            Reporter:
            Jason Dillaman
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development