Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-8690

Member that fails availability check is never suspected again

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.12.0, 1.13.0, 1.14.0
    • Fix Version/s: None
    • Component/s: membership
    • Labels:
      None

      Description

      In a test run on support/1.12 there was a cluster with 3 locators and a number of servers. It had a membership view like this:

      [ loc1, loc2, loc3, server1, server2, etc]
      

      The test killed loc1 and loc2 and tried to restart loc2. In this scenario loc3 should have detected the loss of the other two locators and it should have become the membership coordinator but it didn't. Loc3 detected the loss of loc2 and then received a LEAVE request from loc1. At that point it ought to have either started examining loc2 again or perhaps just become the coordinator, but it did neither of these and the cluster had no coordinator.

      This is similar to GEODE-3780 but in that case an earlier availability check passed.

      In the test run the names of the locators are
      loc1=locatorgemfire_4_3
      loc2=locatorgemfire_4_4 and
      loc3=locatorgemfire_4_2

      [info 2020/10/30 21:51:51.197 PDT <P2P message reader for (locatorgemfire_4_4_host2_3884:3884:locator)<ec><v1>:41005 shared unordered uid=2 port=42550> tid=0x36] Performing availability check for suspect member (locatorgemfire_4_4_host2_3884:3884:locator)<ec><v1>:41005 reason=member unexpectedly shut down shared, unordered connection
      
      [info 2020/10/30 21:51:51.309 PDT <Pooled High Priority Message Processor 3> tid=0x51] received leave request from (locatorgemfire_4_3_host2_3866:3866:locator)<ec><v0>:41004 for (locatorgemfire_4_3_host2_3866:3866:locator)<ec><v0>:41004
      
      [info 2020/10/30 21:51:51.345 PDT <Pooled High Priority Message Processor 3> tid=0x51] Checking to see if I should become coordinator.  My address is (locatorgemfire_4_2_host2_3852:3852:locator)<ec><v1>:41007
      
      [info 2020/10/30 21:51:51.346 PDT <Pooled High Priority Message Processor 3> tid=0x51] View with removed and left members removed is View[rs-(locatorgemfire_4_3_host2_3866:3866:locator)<ec><v0>:41004|3] members: [(locatorgemfire_4_4_host2_3884:3884:locator)<ec><v1>:41005, (locatorgemfire_4_2_host2_3852:3852:locator)<ec><v1>:41007, (locatorgemfire_4_1_host2_3843:3843:locator)<ec><v1>:41006, (peergemfire_4_1_host2_3959:3959)<ec><v2>:41010{lead}, (peergemfire_4_2_host2_3967:3967)<ec><v2>:41009] and coordinator would be (locatorgemfire_4_4_host2_3884:3884:locator)<ec><v1>:41005
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              bschuchardt Bruce J Schuchardt
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: