Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-5080

CI Failure: ClusterConfigLocatorRestartDUnitTest.serverRestartsAfterLocatorReconnects

    XMLWordPrintableJSON

Details

    Description

      This test intermittently fails with with following:

      org.apache.geode.management.internal.configuration.ClusterConfigLocatorRestartDUnitTest > serverRestartsAfterLocatorReconnects FAILED
          org.apache.geode.test.dunit.RMIException: While invoking org.apache.geode.test.dunit.rules.ClusterStartupRule$$Lambda$41/761947362.call in VM 3 running on Host b669312074c0 with 5 VMs
              at org.apache.geode.test.dunit.VM.invoke(VM.java:436)
              at org.apache.geode.test.dunit.VM.invoke(VM.java:405)
              at org.apache.geode.test.dunit.VM.invoke(VM.java:371)
              at org.apache.geode.test.dunit.rules.ClusterStartupRule.startServerVM(ClusterStartupRule.java:203)
              at org.apache.geode.test.dunit.rules.ClusterStartupRule.startServerVM(ClusterStartupRule.java:196)
              at org.apache.geode.test.dunit.rules.ClusterStartupRule.startServerVM(ClusterStartupRule.java:182)
              at org.apache.geode.management.internal.configuration.ClusterConfigLocatorRestartDUnitTest.serverRestartsAfterLocatorReconnects(ClusterConfigLocatorRestartDUnitTest.java:65)
      
              Caused by:
              org.apache.geode.GemFireConfigException: Unable to join the distributed system.  Operation either timed out, was stopped or Locator does not exist.
      

      The detailed test failure shows the following cause:

      Caused by: org.apache.geode.GemFireConfigException: Unable to join the distributed system.  Operation either timed out, was stopped or Locator does not exist.
      	at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.join(GMSMembershipManager.java:661)
      	at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.joinDistributedSystem(GMSMembershipManager.java:747)
      	at org.apache.geode.distributed.internal.membership.gms.Services.start(Services.java:191)
      	at org.apache.geode.distributed.internal.membership.gms.GMSMemberFactory.newMembershipManager(GMSMemberFactory.java:106)
      	at org.apache.geode.distributed.internal.membership.MemberFactory.newMembershipManager(MemberFactory.java:90)
      	at org.apache.geode.distributed.internal.ClusterDistributionManager.<init>(ClusterDistributionManager.java:1027)
      	at org.apache.geode.distributed.internal.ClusterDistributionManager.<init>(ClusterDistributionManager.java:1061)
      	at org.apache.geode.distributed.internal.ClusterDistributionManager.create(ClusterDistributionManager.java:554)
      	at org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:763)
      	at org.apache.geode.distributed.internal.InternalDistributedSystem.newInstance(InternalDistributedSystem.java:355)
      	at org.apache.geode.distributed.internal.InternalDistributedSystem.newInstance(InternalDistributedSystem.java:343)
      	at org.apache.geode.distributed.internal.InternalDistributedSystem.newInstance(InternalDistributedSystem.java:335)
      	at org.apache.geode.distributed.DistributedSystem.connect(DistributedSystem.java:211)
      	at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:219)
      	at org.apache.geode.test.junit.rules.ServerStarterRule.startServer(ServerStarterRule.java:172)
      	at org.apache.geode.test.junit.rules.ServerStarterRule.before(ServerStarterRule.java:78)
      	at org.apache.geode.test.dunit.rules.ClusterStartupRule.lambda$startServerVM$a2926408$1(ClusterStartupRule.java:212)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at hydra.MethExecutor.executeObject(MethExecutor.java:244)
      	at org.apache.geode.test.dunit.standalone.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:70)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:361)
      	at sun.rmi.transport.Transport$1.run(Transport.java:200)
      	at sun.rmi.transport.Transport$1.run(Transport.java:197)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
      	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
      	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
      	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
      	... 3 more
      

      The problem is that after the locator is 'crashed' a loop is entered to wait for the ClusterConfigurationService to restart. However, sometime this check happens too quickly after the crash and the CC still appears to be available.

      Attachments

        Issue Links

          Activity

            People

              jens.deppe Jens Deppe
              jens.deppe Jens Deppe
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h