Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-8721

member that should become coordinator never detects loss of current coordinator

    XMLWordPrintableJSON

Details

    Description

      During a network partition a server that should have become membership coordinator and shut down its side of the partition never detected the loss of a server on the other side of the partition. Instead it continually performed availability checks on that other server and the checks passed. Its log file had continually increasing timestamps for when it claimed the other server had contacted it, which was not possible due to the network partition (which was formed through iptable manipulation).

      At least one other server on its side of the network partition was doing the same thing. It looks like they were interfering with each others availability checks in some way.

      locatorp1_26023/system.log: [info 2020/10/20 22:23:16.227 PDT <Geode UDP Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Availability check detected recent message traffic for suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue Oct 20 22:23:12 PDT 2020
      
      locatorp1_26023/system.log: [info 2020/10/20 22:23:16.228 PDT <Geode UDP Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Availability check passed for suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000
      
      
      bridgep1_25995/system.log: [info 2020/10/20 22:23:16.229 PDT <unicast receiver,rs-F21040449a0i3large-72-61636> tid=0x23] No longer suspecting 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000
      
      
      bridgep1_25998/system.log: [info 2020/10/20 22:23:17.212 PDT <Geode UDP Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Availability check detected recent message traffic for suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue Oct 20 22:23:14 PDT 2020
      
      bridgep1_25998/system.log: [info 2020/10/20 22:23:17.213 PDT <Geode UDP Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Availability check passed for suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000
      
      
      locatorp1_26023/system.log: [info 2020/10/20 22:23:17.232 PDT <Geode UDP Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Performing availability check for suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 reason=Unable to send messages to this member via JGroups
      
      
      bridgep1_25998/system.log: [info 2020/10/20 22:23:18.215 PDT <Geode UDP Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Performing availability check for suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 reason=Unable to send messages to this member via JGroups
      
      
      bridgep1_25995/system.log: [info 2020/10/20 22:23:21.006 PDT <Geode UDP Timer-2,rs-F21040449a0i3large-72-61636> tid=0x21] Availability check detected recent message traffic for suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue Oct 20 22:23:16 PDT 2020
      

      Attachments

        Issue Links

          Activity

            People

              bschuchardt Bruce J Schuchardt
              bschuchardt Bruce J Schuchardt
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: