Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-9350

MemberJoinedEvent should be triggered after new view is installed

    XMLWordPrintableJSON

Details

    Description

      While investigating GEODE-9070, we noticed a problem when a server tries to join a cluster, and soon after, membership fails with ShunnedMemberException:

      org.apache.geode.distributed.internal.direct.ShunnedMemberException: Member is being shunned: ccf730fb2b62(161)<v2>:41002
       at org.apache.geode.distributed.internal.direct.DirectChannel.getConnections(DirectChannel.java:469)
       at org.apache.geode.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:283)
       at org.apache.geode.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:190)
       at org.apache.geode.distributed.internal.direct.DirectChannel.send(DirectChannel.java:550)
       at org.apache.geode.distributed.internal.DistributionImpl.directChannelSend(DistributionImpl.java:354)
       at org.apache.geode.distributed.internal.DistributionImpl.send(DistributionImpl.java:296)
       at org.apache.geode.distributed.internal.ClusterDistributionManager.sendViaMembershipManager(ClusterDistributionManager.java:2068)
       at org.apache.geode.distributed.internal.ClusterDistributionManager.sendOutgoing(ClusterDistributionManager.java:1983)
       at org.apache.geode.distributed.internal.ClusterDistributionManager.sendMessage(ClusterDistributionManager.java:2028)
       at org.apache.geode.distributed.internal.ClusterDistributionManager.putOutgoing(ClusterDistributionManager.java:1085)
       at org.apache.geode.internal.cache.execute.StreamingFunctionOperation.getFunctionResultFrom(StreamingFunctionOperation.java:113)
       at org.apache.geode.internal.cache.execute.MemberFunctionExecutor.executeFunction(MemberFunctionExecutor.java:149)
       at org.apache.geode.internal.cache.execute.MemberFunctionExecutor.executeFunction(MemberFunctionExecutor.java:191)
       at org.apache.geode.internal.cache.execute.AbstractExecution.execute(AbstractExecution.java:397)
       at org.apache.geode.internal.cache.execute.AbstractExecution.execute(AbstractExecution.java:402)
       at org.apache.geode.modules.util.BootstrappingFunction.bootstrapMember(BootstrappingFunction.java:170)
       at org.apache.geode.modules.util.BootstrappingFunction.memberJoined(BootstrappingFunction.java:240)
       at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberJoinedEvent.handleEvent(ClusterDistributionManager.java:2498)
       at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2451)
       at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2440)
       at org.apache.geode.distributed.internal.ClusterDistributionManager.handleMemberEvent(ClusterDistributionManager.java:1406)
       at org.apache.geode.distributed.internal.ClusterDistributionManager.access$200(ClusterDistributionManager.java:109)
       at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEventInvoker.run(ClusterDistributionManager.java:1438)
       at java.base/java.lang.Thread.run(Thread.java:834)

      Further analysis showed that ShunnedMemberException is thrown because GMSMembership.memberExists() method returns false, which means that the member ccf730fb2b62(161)<v2>:41002 was not in the view. Looking at the stacktrace, we noticed that BootstrappingFunction.bootstrapMember() gets executed on MemberJoinedEvent, which is triggered by MembershipListener.newMemberConnected(). newMemberConnected() is called in GMSMembership.processView() before the new view is installed, so it's likely that the failure happens because BootstrappingFunction receives the event before the view was actually updated. Possible solution for this problem could be to change GMSMembership.processView() to call MembershipListener.newMemberConnected() only after the new view is installed.

      This issue was introduced by the fix for GEODE-7245 which removed latestView lock from GMSMembership.memberExists(). Before GEODE-7245, this method was waiting until GMSMembership.processView() released the lock, so the problem described above could never happen. GEODE-7245 was back-ported to 1.14.

      Attachments

        Issue Links

          Activity

            People

              kaslami Kamilla Aslami
              kaslami Kamilla Aslami
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: