Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-7277

CI failure: DestroyLuceneIndexCommandsDUnitTest > testDestroyIndexesWithOneIndexAndRegionInOneMember failed with RegionDestroyedException suspicious string

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.11.0
    • jmx
    • None

    Description

      DistributedTestOpenJDK11 #1147 (https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/1147) failed with this RegionDestroyedException suspicious string:

      org.apache.geode.cache.lucene.internal.cli.DestroyLuceneIndexCommandsDUnitTest > testDestroyIndexesWithOneIndexAndRegionInOneMember FAILED
          java.lang.AssertionError: Suspicious strings were written to the log during this run.
          Fix the strings or use IgnoredException.addIgnoredException to ignore.
          -----------------------------------------------------------------------
          Found suspect string in log4j at line 2012
      
          [fatal 2019/10/07 21:18:02.334 GMT <FederatingManager4> tid=845] Uncaught exception in thread Thread[FederatingManager4,5,RMI Runtime]
          org.apache.geode.cache.RegionDestroyedException: org.apache.geode.internal.cache.DistributedRegion[path='/_monitoringRegion_172.17.0.10<v2>41002';scope=DISTRIBUTED_NO_ACK';dataPolicy=REPLICATE]
          	at org.apache.geode.internal.cache.LocalRegion.checkRegionDestroyed(LocalRegion.java:7293)
          	at org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6157)
          	at org.apache.geode.internal.cache.DistributedRegion.basicDestroyRegion(DistributedRegion.java:1821)
          	at org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6128)
          	at org.apache.geode.internal.cache.LocalRegion.localDestroyRegion(LocalRegion.java:2251)
          	at org.apache.geode.internal.cache.AbstractRegion.localDestroyRegion(AbstractRegion.java:452)
          	at org.apache.geode.management.internal.FederatingManager.removeMemberArtifacts(FederatingManager.java:216)
          	at org.apache.geode.management.internal.FederatingManager.access$000(FederatingManager.java:67)
          	at org.apache.geode.management.internal.FederatingManager$RemoveMemberTask.run(FederatingManager.java:564)
          	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
          	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
          	at java.base/java.lang.Thread.run(Thread.java:834)
      

      This failure has nothing to do with Lucene.

      Some analysis shows that stopping a member and stopping the locator nearly simultaneously can cause this RegionDestroyedException.

      The test output shows:

      The locator starts:

      [vm0] [info 2019/10/07 21:17:55.174 GMT <RMI TCP Connection(1)-172.17.0.10> tid=0x22] Starting DistributionManager 172.17.0.10(locator-0:87:locator)<ec><v0>:41000.  (took 26 ms)
      

      Server1 starts and _monitoringRegion_172.17.0.10<v1>41001 is created for it in the locator:

      [vm1] [info 2019/10/07 21:17:59.681 GMT <RMI TCP Connection(1)-172.17.0.10> tid=0x22] Starting DistributionManager 172.17.0.10(server-1:95)<v1>:41001.  (took 405 ms)
      [vm0] [info 2019/10/07 21:17:59.684 GMT <FederatingManager1> tid=0x332] Initializing region _monitoringRegion_172.17.0.10<v1>41001
      [vm0] [info 2019/10/07 21:17:59.694 GMT <FederatingManager1> tid=0x332] Initialization of region _monitoringRegion_172.17.0.10<v1>41001 completed
      

      Server2 starts and _monitoringRegion_172.17.0.10<v2>41002 is created for it in the locator:

      [vm2] [info 2019/10/07 21:18:00.379 GMT <RMI TCP Connection(1)-172.17.0.10> tid=0x22] Starting DistributionManager 172.17.0.10(server-2:104)<v2>:41002.  (took 361 ms)
      [vm0] [info 2019/10/07 21:18:00.379 GMT <FederatingManager2> tid=0x33c] Initializing region _monitoringRegion_172.17.0.10<v2>41002
      [vm0] [info 2019/10/07 21:18:00.424 GMT <FederatingManager2> tid=0x33c] Initialization of region _monitoringRegion_172.17.0.10<v2>41002 completed
      

      Server2 shuts down:

      [vm2] [info 2019/10/07 21:18:02.007 GMT <RMI TCP Connection(1)-172.17.0.10> tid=0x22] Shutting down DistributionManager 172.17.0.10(server-2:104)<v2>:41002. 
      [vm2] [info 2019/10/07 21:18:02.127 GMT <RMI TCP Connection(1)-172.17.0.10> tid=0x22] Marking DistributionManager 172.17.0.10(server-2:104)<v2>:41002 as closed.
      

      Server1 shuts down:

      [vm1] [info 2019/10/07 21:18:02.189 GMT <RMI TCP Connection(1)-172.17.0.10> tid=0x22] Shutting down DistributionManager 172.17.0.10(server-1:95)<v1>:41001. 
      [vm1] [info 2019/10/07 21:18:02.309 GMT <RMI TCP Connection(1)-172.17.0.10> tid=0x22] Marking DistributionManager 172.17.0.10(server-1:95)<v1>:41001 as closed.
      

      The locator logs server1 leaving the distributed system:

      [vm0] [info 2019/10/07 21:18:02.316 GMT <View Message Processor1> tid=0x331] Member at 172.17.0.10(server-1:95)<v1>:41001 gracefully left the distributed cache: departed membership view
      

      Then, the locator starts to shut down:

      [vm0] [info 2019/10/07 21:18:02.322 GMT <RMI TCP Connection(1)-172.17.0.10> tid=0x22] Stopping Distribution Locator on 7f8e3ffa7dc2[34127]
      

      Then, the locator logs server2 leaving the distributed system:

      [vm0] [info 2019/10/07 21:18:02.323 GMT <View Message Processor1> tid=0x331] Member at 172.17.0.10(server-2:104)<v2>:41002 gracefully left the distributed cache: departed membership view
      

      As a results of server2 leaving the distributed system, a RemoveMemberTask is spun off to destroy the region:

      [vm0] [fatal 2019/10/07 21:18:02.334 GMT <FederatingManager4> tid=0x34d] Uncaught exception in thread Thread[FederatingManager4,5,RMI Runtime]
      [vm0] org.apache.geode.cache.RegionDestroyedException: org.apache.geode.internal.cache.DistributedRegion[path='/_monitoringRegion_172.17.0.10<v2>41002';scope=DISTRIBUTED_NO_ACK';dataPolicy=REPLICATE]
      [vm0] 	at org.apache.geode.internal.cache.LocalRegion.checkRegionDestroyed(LocalRegion.java:7293)
      [vm0] 	at org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6157)
      [vm0] 	at org.apache.geode.internal.cache.DistributedRegion.basicDestroyRegion(DistributedRegion.java:1821)
      [vm0] 	at org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6128)
      [vm0] 	at org.apache.geode.internal.cache.LocalRegion.localDestroyRegion(LocalRegion.java:2251)
      [vm0] 	at org.apache.geode.internal.cache.AbstractRegion.localDestroyRegion(AbstractRegion.java:452)
      [vm0] 	at org.apache.geode.management.internal.FederatingManager.removeMemberArtifacts(FederatingManager.java:216)
      [vm0] 	at org.apache.geode.management.internal.FederatingManager.access$000(FederatingManager.java:67)
      [vm0] 	at org.apache.geode.management.internal.FederatingManager$RemoveMemberTask.run(FederatingManager.java:564)
      [vm0] 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      [vm0] 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      [vm0] 	at java.base/java.lang.Thread.run(Thread.java:834)
      

      Immediately after that, the closing message is logged:

      [vm0] [info 2019/10/07 21:18:02.342 GMT <Distribution Locator on 7f8e3ffa7dc2> tid=0x306] GemFireCache[id = 1017379193; isClosing = true; isShutDownAll = false; created = Mon Oct 07 21:17:55 GMT 2019; server = false; copyOnRead = false; lockLease = 120; lockTimeout = 60]: Now closing.
      

      The reason the localDestroyRegion failed is because the monitoringRegion had already been destroyed during cache close.

      Attachments

        Issue Links

          Activity

            People

              klund Kirk Lund
              boglesby Barrett Oglesby
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m