Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.14.0
Description
During a network partition a server that should have become membership coordinator and shut down its side of the partition never detected the loss of a server on the other side of the partition. Instead it continually performed availability checks on that other server and the checks passed. Its log file had continually increasing timestamps for when it claimed the other server had contacted it, which was not possible due to the network partition (which was formed through iptable manipulation).
At least one other server on its side of the network partition was doing the same thing. It looks like they were interfering with each others availability checks in some way.
locatorp1_26023/system.log: [info 2020/10/20 22:23:16.227 PDT <Geode UDP Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Availability check detected recent message traffic for suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue Oct 20 22:23:12 PDT 2020 locatorp1_26023/system.log: [info 2020/10/20 22:23:16.228 PDT <Geode UDP Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Availability check passed for suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 bridgep1_25995/system.log: [info 2020/10/20 22:23:16.229 PDT <unicast receiver,rs-F21040449a0i3large-72-61636> tid=0x23] No longer suspecting 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 bridgep1_25998/system.log: [info 2020/10/20 22:23:17.212 PDT <Geode UDP Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Availability check detected recent message traffic for suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue Oct 20 22:23:14 PDT 2020 bridgep1_25998/system.log: [info 2020/10/20 22:23:17.213 PDT <Geode UDP Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Availability check passed for suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 locatorp1_26023/system.log: [info 2020/10/20 22:23:17.232 PDT <Geode UDP Timer-2,rs-F21040449a0i3large-72-47481> tid=0x23] Performing availability check for suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 reason=Unable to send messages to this member via JGroups bridgep1_25998/system.log: [info 2020/10/20 22:23:18.215 PDT <Geode UDP Timer-2,rs-F21040449a0i3large-72-2074> tid=0x21] Performing availability check for suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 reason=Unable to send messages to this member via JGroups bridgep1_25995/system.log: [info 2020/10/20 22:23:21.006 PDT <Geode UDP Timer-2,rs-F21040449a0i3large-72-61636> tid=0x21] Availability check detected recent message traffic for suspect member 10.32.109.233(locatorp2_host2_21762:21762:locator)<ec><v0>:41000 at time Tue Oct 20 22:23:16 PDT 2020
Attachments
Issue Links
- links to