Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-6522

Server hangs during shutdown after becoming membership coordinator

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.9.0
    • membership
    • None

    Description

      Recent changes to processing of "leave" requests can cause a member to become the coordinator when it receives a request from one member and all other potential coordinators have failed availability checks.  Unfortunately this is causing a hang during shutdown.  The new coordinator sends out a new view but that view doesn't have the members that failed availability checks removed.  This causes the new View Creator thread to be stopped.  Another one isn't started until additional suspect processing is performed but that doesn't always happen.  This can cause shutdown to hang with other threads trying to contact servers that are no longer there.

       

      [info 2019/03/13 08:38:24.959 PDT <Geode Membership View Creator> tid=0x147] finished waiting for responses to view preparation
      
      [info 2019/03/13 08:38:24.959 PDT <Geode Membership View Creator> tid=0x147] received new view: View[turtle(bridgegemfire_2_1_host1_17023:17023)<ec><v2>:41014|15] members: [turtle(locatorgemfire_2_1_host1_17653:17653:locator)<ec><v1>:41002, turtle(bridgegemfire_2_1_host1_17023:17023)<ec><v2>:41014{lead}, turtle(bridgegemfire_2_4_host1_17100:17100)<ec><v2>:41015, turtle(bridgegemfire_2_2_host1_17048:17048)<ec><v3>:41019, turtle(bridgegemfire_2_3_host1_17064:17064)<ec><v3>:41023] shutdown: [turtle(locatorgemfire_2_3_host1_17698:17698:locator)<ec><v0>:41000, turtle(locatorgemfire_2_4_host1_17715:17715:locator)<ec><v1>:41003, turtle(locatorgemfire_2_2_host1_17676:17676:locator)<ec><v1>:41005]
      old view is: View[turtle(locatorgemfire_2_3_host1_17698:17698:locator)<ec><v0>:41000|3] members: [turtle(locatorgemfire_2_3_host1_17698:17698:locator)<ec><v0>:41000, turtle(locatorgemfire_2_1_host1_17653:17653:locator)<ec><v1>:41002, turtle(locatorgemfire_2_4_host1_17715:17715:locator)<ec><v1>:41003, turtle(locatorgemfire_2_2_host1_17676:17676:locator)<ec><v1>:41005, turtle(bridgegemfire_2_1_host1_17023:17023)<ec><v2>:41014{lead}, turtle(bridgegemfire_2_4_host1_17100:17100)<ec><v2>:41015, turtle(bridgegemfire_2_2_host1_17048:17048)<ec><v3>:41019, turtle(bridgegemfire_2_3_host1_17064:17064)<ec><v3>:41023]
      
      [info 2019/03/13 08:38:24.974 PDT <Geode Membership View Creator> tid=0x147] Failure detection is now watching turtle(bridgegemfire_2_4_host1_17100:17100)<ec><v2>:41015; suspects are {turtle(locatorgemfire_2_1_host1_17653:17653:locator)<ec><v1>:41002=View[turtle(locatorgemfire_2_3_host1_17698:17698:locator)<ec><v0>:41000|3] members: [turtle(locatorgemfire_2_3_host1_17698:17698:locator)<ec><v0>:41000, turtle(locatorgemfire_2_1_host1_17653:17653:locator)<ec><v1>:41002, turtle(locatorgemfire_2_4_host1_17715:17715:locator)<ec><v1>:41003, turtle(locatorgemfire_2_2_host1_17676:17676:locator)<ec><v1>:41005, turtle(bridgegemfire_2_1_host1_17023:17023)<ec><v2>:41014{lead}, turtle(bridgegemfire_2_4_host1_17100:17100)<ec><v2>:41015, turtle(bridgegemfire_2_2_host1_17048:17048)<ec><v3>:41019, turtle(bridgegemfire_2_3_host1_17064:17064)<ec><v3>:41023]}
      
      [info 2019/03/13 08:38:24.981 PDT <Geode Membership View Creator> tid=0x147] sending new view View[turtle(bridgegemfire_2_1_host1_17023:17023)<ec><v2>:41014|15] members: [turtle(locatorgemfire_2_1_host1_17653:17653:locator)<ec><v1>:41002, turtle(bridgegemfire_2_1_host1_17023:17023)<ec><v2>:41014{lead}, turtle(bridgegemfire_2_4_host1_17100:17100)<ec><v2>:41015, turtle(bridgegemfire_2_2_host1_17048:17048)<ec><v3>:41019, turtle(bridgegemfire_2_3_host1_17064:17064)<ec><v3>:41023] shutdown: [turtle(locatorgemfire_2_3_host1_17698:17698:locator)<ec><v0>:41000, turtle(locatorgemfire_2_4_host1_17715:17715:locator)<ec><v1>:41003, turtle(locatorgemfire_2_2_host1_17676:17676:locator)<ec><v1>:41005]
      
      [info 2019/03/13 08:38:24.981 PDT <Geode Membership View Creator> tid=0x147] BRUCE: setting shutdown flag in view creator
      java.lang.Exception: stack trace
      at org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.setShutdownFlag(GMSJoinLeave.java:2247)
      at org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.prepareAndSendView(GMSJoinLeave.java:2713)
      at org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.sendInitialView(GMSJoinLeave.java:2220)
      at org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.run(GMSJoinLeave.java:2299)
      
      [info 2019/03/13 08:38:24.982 PDT <Geode Membership View Creator> tid=0x147] View Creator thread is exiting
      
      [info 2019/03/13 08:38:24.982 PDT <Geode Membership View Creator> tid=0x147] BRUCE: setting shutdown flag in view creator
      java.lang.Exception: stack trace
      at org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.setShutdownFlag(GMSJoinLeave.java:2247)
      at org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.run(GMSJoinLeave.java:2379)
      
      [info 2019/03/13 08:38:26.416 PDT <vm_4_thr_8_bridge_2_1_host1_17023> tid=0x150] GemFireCache[id = 66144348; isClosing = true; isShutDownAll = false; created = Wed Mar 13 08:34:41 PDT 2019; server = false; copyOnRead = false; lockLease = 120; lockTimeout = 60]: Now closing.
      
      
      
      ...
      
      [info 2019/03/13 08:38:27.495 PDT <Geode Membership View Creator> tid=0x161] View Creator thread is starting
      
      [info 2019/03/13 08:38:27.507 PDT <Geode Membership View Creator> tid=0x161] preparing new view View[turtle(bridgegemfire_2_1_host1_17023:17023)<ec><v2>:41014|21] members: [turtle(locatorgemfire_2_1_host1_17653:17653:locator)<ec><v1>:41002, turtle(bridgegemfire_2_1_host1_17023:17023)<ec><v2>:41014{lead}, turtle(bridgegemfire_2_4_host1_17100:17100)<ec><v2>:41015, turtle(bridgegemfire_2_3_host1_17064:17064)<ec><v3>:41023] shutdown: [turtle(bridgegemfire_2_2_host1_17048:17048)<ec><v3>:41019]
      
      [info 2019/03/13 08:38:27.508 PDT <Geode Membership View Creator> tid=0x161] finished waiting for responses to view preparation
      
      [info 2019/03/13 08:38:27.508 PDT <Geode Membership View Creator> tid=0x161] received new view: View[turtle(bridgegemfire_2_1_host1_17023:17023)<ec><v2>:41014|21] members: [turtle(locatorgemfire_2_1_host1_17653:17653:locator)<ec><v1>:41002, turtle(bridgegemfire_2_1_host1_17023:17023)<ec><v2>:41014{lead}, turtle(bridgegemfire_2_4_host1_17100:17100)<ec><v2>:41015, turtle(bridgegemfire_2_3_host1_17064:17064)<ec><v3>:41023] shutdown: [turtle(bridgegemfire_2_2_host1_17048:17048)<ec><v3>:41019]
      old view is: View[turtle(bridgegemfire_2_1_host1_17023:17023)<ec><v2>:41014|15] members: [turtle(locatorgemfire_2_1_host1_17653:17653:locator)<ec><v1>:41002, turtle(bridgegemfire_2_1_host1_17023:17023)<ec><v2>:41014{lead}, turtle(bridgegemfire_2_4_host1_17100:17100)<ec><v2>:41015, turtle(bridgegemfire_2_2_host1_17048:17048)<ec><v3>:41019, turtle(bridgegemfire_2_3_host1_17064:17064)<ec><v3>:41023] shutdown: [turtle(locatorgemfire_2_3_host1_17698:17698:locator)<ec><v0>:41000, turtle(locatorgemfire_2_4_host1_17715:17715:locator)<ec><v1>:41003, turtle(locatorgemfire_2_2_host1_17676:17676:locator)<ec><v1>:41005]
      
      [info 2019/03/13 08:38:27.566 PDT <Geode Membership View Creator> tid=0x161] sending new view View[turtle(bridgegemfire_2_1_host1_17023:17023)<ec><v2>:41014|21] members: [turtle(locatorgemfire_2_1_host1_17653:17653:locator)<ec><v1>:41002, turtle(bridgegemfire_2_1_host1_17023:17023)<ec><v2>:41014{lead}, turtle(bridgegemfire_2_4_host1_17100:17100)<ec><v2>:41015, turtle(bridgegemfire_2_3_host1_17064:17064)<ec><v3>:41023] shutdown: [turtle(bridgegemfire_2_2_host1_17048:17048)<ec><v3>:41019]
      
      [info 2019/03/13 08:38:27.567 PDT <Geode Membership View Creator> tid=0x161] BRUCE: setting shutdown flag in view creator
      java.lang.Exception: stack trace
      at org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.setShutdownFlag(GMSJoinLeave.java:2247)
      at org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.prepareAndSendView(GMSJoinLeave.java:2713)
      at org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.sendInitialView(GMSJoinLeave.java:2220)
      at org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave$ViewCreator.run(GMSJoinLeave.java:2299)
      
      [info 2019/03/13 08:38:27.567 PDT <Geode Membership View Creator> tid=0x161] View Creator thread is exiting
      
      

      etc.

       

      Attachments

        Issue Links

          Activity

            People

              bschuchardt Bruce J Schuchardt
              bschuchardt Bruce J Schuchardt
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h