Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-8697

Propagate ForcedDisconnectException to the user application in a network partition scenario

    XMLWordPrintableJSON

Details

    Description

      During network partitioning, we expect that the coordinator closes its cluster with a ForcedDisconnectException. However, in some cases, threads end up with a MemberDisconnectedException.

      System logs show that a ForcedDisconnect has happened:

      org.apache.geode.distributed.internal.membership.api.MemberDisconnectedException: Membership coordinator 10.32.111.185(gemfire3_host1_7340:7340:locator)<ec><v0>:41000 has declared that a network partition has occurred
       at org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.forceDisconnect(GMSMembership.java:2007)
       at org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.forceDisconnect(GMSJoinLeave.java:1085)
       at org.apache.geode.distributed.internal.membership.gms.membership.GMSJoinLeave.processMessage(GMSJoinLeave.java:1422)
       at org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1327)
       at org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger$JGroupsReceiver.receive(JGroupsMessenger.java:1266)
       at org.jgroups.JChannel.invokeCallback(JChannel.java:816)
       at org.jgroups.JChannel.up(JChannel.java:741)
       at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1030)
       at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
       at org.jgroups.protocols.FlowControl.up(FlowControl.java:390)
       at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1077)
       at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:792)
       at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:433)
       at org.apache.geode.distributed.internal.membership.gms.messenger.StatRecorder.up(StatRecorder.java:72)
       at org.apache.geode.distributed.internal.membership.gms.messenger.AddressManager.up(AddressManager.java:70)
       at org.jgroups.protocols.TP.passMessageUp(TP.java:1658)
       at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1876)
       at org.jgroups.util.DirectExecutor.execute(DirectExecutor.java:10)
       at org.jgroups.protocols.TP.handleSingleMessage(TP.java:1789)
       at org.jgroups.protocols.TP.receive(TP.java:1714)
       at org.apache.geode.distributed.internal.membership.gms.messenger.Transport.receive(Transport.java:159)
       at org.jgroups.protocols.UDP$PacketReceiver.run(UDP.java:701)
       at java.lang.Thread.run(Thread.java:748)

      But it is never propagated upwards to the user application:

      org.apache.geode.distributed.DistributedSystemDisconnectedException: This connection to a distributed system has been disconnected., caused by org.apache.geode.distributed.internal.membership.api.MemberDisconnectedException: Membership coordinator 10.32.111.185(gemfire3_host1_7340:7340:locator)<ec><v0>:41000 has declared that a network partition has occurred
       at org.apache.geode.distributed.internal.InternalDistributedSystem.checkConnected(InternalDistributedSystem.java:978)
       at org.apache.geode.distributed.internal.InternalDistributedSystem.getDistributionManager(InternalDistributedSystem.java:1679)
       at org.apache.geode.distributed.internal.ReplyProcessor21.getDistributionManager(ReplyProcessor21.java:366)
       at org.apache.geode.distributed.internal.ReplyProcessor21.postWait(ReplyProcessor21.java:600)
       at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:824)
       at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:779)
       at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:865)
       at org.apache.geode.internal.cache.partitioned.FetchKeysMessage$FetchKeysResponse.waitForKeys(FetchKeysMessage.java:584)
       at org.apache.geode.internal.cache.PartitionedRegion.getBucketKeys(PartitionedRegion.java:4463)
       at org.apache.geode.internal.cache.PartitionedRegionDataView.getBucketKeys(PartitionedRegionDataView.java:118)
       at org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.getNextBucketIter(PartitionedRegion.java:6180)
       at org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.hasNext(PartitionedRegion.java:6146)
       at org.apache.geode.internal.cache.PartitionedRegion$KeysSet.toArray(PartitionedRegion.java:6251)
       at org.apache.geode.internal.cache.PartitionedRegion$KeysSet.toArray(PartitionedRegion.java:6245)

      Attachments

        Issue Links

          Activity

            People

              bschuchardt Bruce J Schuchardt
              kaslami Kamilla Aslami
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: