Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-8248

Member hangs waiting for missing disk-stores after gfsh shutdown

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • gfsh, persistence
    • None

    Description

      Let’s say I have 2 servers with a simple REPLICATE_PERSISTENT region and I stop both using the gfsh shutdown command.
      According to the documentation, I should be able to start either of the servers without any problems as both host the most up to date data. However, what happens in reality is that the startup hangs with the following:

      (1) Executing - start server --name=server1 --locators=localhost[10334] --server-port=40401 --cache-xml-file=/temporal/cache.xml
      
      .........
      Region /TestRegion has potentially stale data. It is waiting for another member to recover the latest data.
      My persistent id:
      
        DiskStore ID: 4d1abaf3-677d-4c52-b3f8-681e051f143c
        Name: server1
        Location: /temporal/server1/dataStore
      
      Members with potentially new data:
      [
        DiskStore ID: 163dfaf7-a680-4154-a278-8cec40d57d80
        Name: server2
        Location: /temporal/server2/dataStore
      ]
      
      
      "main" #1 prio=5 os_prio=31 tid=0x00007f9b28809000 nid=0x1003 in Object.wait() [0x000070000ab04000]
         java.lang.Thread.State: TIMED_WAITING (on object monitor)
      	at java.lang.Object.wait(Native Method)
      	at org.apache.geode.internal.cache.persistence.MembershipChangeListener.waitForChange(MembershipChangeListener.java:62)
      	- locked <0x0000000719df55e0> (a org.apache.geode.internal.cache.persistence.MembershipChangeListener)
      	at org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.waitForMembershipChangeForMissingDiskStores(PersistenceInitialImageAdvisor.java:218)
      	at org.apache.geode.internal.cache.persistence.PersistenceInitialImageAdvisor.getAdvice(PersistenceInitialImageAdvisor.java:118)
      	at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:835)
      	at org.apache.geode.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:52)
      	at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1196)
      	at org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1076)
      	at org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3043)
      	at org.apache.geode.pdx.internal.PeerTypeRegistration.initialize(PeerTypeRegistration.java:198)
      	at org.apache.geode.pdx.internal.TypeRegistry.initialize(TypeRegistry.java:116)
      	at org.apache.geode.internal.cache.GemFireCacheImpl.initializePdxRegistry(GemFireCacheImpl.java:1449)
      	- locked <0x00000005c0593168> (a org.apache.geode.internal.cache.GemFireCacheImpl)
      	at org.apache.geode.internal.cache.xmlcache.CacheCreation.create(CacheCreation.java:511)
      	at org.apache.geode.internal.cache.xmlcache.CacheXmlParser.create(CacheXmlParser.java:337)
      	at org.apache.geode.internal.cache.GemFireCacheImpl.loadCacheXml(GemFireCacheImpl.java:4272)
      	at org.apache.geode.internal.cache.GemFireCacheImpl.initializeDeclarativeCache(GemFireCacheImpl.java:1388)
      	at org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1208)
      	at org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:207)
      	- locked <0x00000005c016a108> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl)
      	- locked <0x00000005c0043de0> (a java.lang.Class for org.apache.geode.internal.cache.InternalCacheBuilder)
      	at org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:164)
      	- locked <0x00000005c0043de0> (a java.lang.Class for org.apache.geode.internal.cache.InternalCacheBuilder)
      	at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:139)
      	at org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
      	at org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:869)
      	at org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:786)
      	at org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:716)
      	at org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:236)
      

      We should either fix the problem and make sure the members fully synchronise their data during the shutdown process so they don't have to wait on each other or, if this is the expected behaviour, update the documentation accordingly.
      The attached zip file contains a simple script to reproduce the issue, the only thing that needs to be changed after downloading and uncompressing the file, it's the GEMFIRE environment variable.

      Attachments

        1. temporal.zip
          2 kB
          Juan Ramos

        Activity

          People

            Unassigned Unassigned
            jjramos Juan Ramos
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: